5rijan commited on
Commit
bf999a7
1 Parent(s): 8337431

Upload 11 files

Browse files
Extended-Keyword-dataset 2.csv ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Inputs;Keywords
2
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The integration of blockchain technology in supply chain management offers transparency and traceability. Companies can track products in real-time, reducing fraud and ensuring authenticity. This innovation also streamlines operations by automating transactions and improving data accuracy.;blockchain, supply chain management, transparency, traceability, real-time tracking, fraud reduction, authenticity, automated transactions, data accuracy
3
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence and machine learning are transforming the healthcare industry. AI-powered diagnostic tools are enhancing accuracy and speed in disease detection, while machine learning models predict patient outcomes and personalize treatment plans. This shift towards digital health is also driving innovations in telemedicine and remote monitoring.;AI, machine learning, healthcare, diagnostic tools, disease detection, patient outcomes, personalized treatment, digital health, telemedicine, remote monitoring
4
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Renewable energy sources such as solar, wind, and geothermal are critical in combating climate change. These technologies reduce reliance on fossil fuels, decrease greenhouse gas emissions, and promote sustainable development. Governments and organizations worldwide are investing heavily in renewable energy infrastructure to ensure a greener future.;renewable energy, solar power, wind power, geothermal energy, climate change, fossil fuels, greenhouse gas emissions, sustainable development, energy infrastructure
5
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: the realm of education, online learning platforms have revolutionized access to knowledge. E-learning tools offer flexibility and personalized learning experiences, catering to diverse needs of students. This digital transformation in education is bridging gaps, providing opportunities for lifelong learning, and fostering global collaboration.;online learning, education, e-learning tools, flexibility, personalized learning, digital transformation, lifelong learning, global collaboration
6
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of Mars has gained momentum with recent advancements in space technology. Robotic missions have successfully landed on the Martian surface, collecting valuable data about its geology and potential for life. These endeavors are laying the groundwork for future human exploration and potential colonization of Mars.;Mars exploration, space technology, robotic missions, Martian geology, potential for life, human exploration, Mars colonization
7
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Cybersecurity remains a top priority as digital threats evolve. Organizations are investing in advanced security measures to protect sensitive data from cyber-attacks. Innovations in cryptography and AI-based threat detection are crucial in safeguarding information and maintaining privacy in an increasingly connected world.;cybersecurity, digital threats, advanced security, sensitive data, cyber-attacks, cryptography, AI-based threat detection, information safeguarding, privacy
8
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The field of biotechnology is witnessing rapid growth with breakthroughs in genetic research. Techniques like CRISPR and gene therapy are opening new avenues for treating genetic disorders. These advancements are not only revolutionizing medicine but also have significant implications for agriculture and environmental conservation.;biotechnology, genetic research, CRISPR, gene therapy, genetic disorders, medical advancements, agriculture, environmental conservation
9
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Urbanization and smart city initiatives are shaping the future of urban living. Technologies such as IoT, AI, and big data are enhancing urban infrastructure, improving resource management, and providing better services to citizens. Smart cities aim to create sustainable and efficient urban environments for the growing population.;urbanization, smart cities, IoT, AI, big data, urban infrastructure, resource management, citizen services, sustainable urban environments
10
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The impact of social media on communication and society is profound. Platforms like Facebook, Twitter, and Instagram have changed the way people interact, share information, and form communities. While social media fosters connectivity, it also raises concerns about privacy, misinformation, and mental health.;social media, communication, society, Facebook, Twitter, Instagram, interaction, information sharing, online communities, privacy, misinformation, mental health
11
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Advances in quantum physics are paving the way for quantum computing. Quantum computers leverage the principles of quantum mechanics to perform complex calculations at unprecedented speeds. This technology has the potential to revolutionize fields such as cryptography, materials science, and artificial intelligence.;quantum physics, quantum computing, quantum mechanics, complex calculations, cryptography, materials science, artificial intelligence
12
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The world around us is evolving with the new age of artificial intelligence, and this has changed the way we think and work. Jobs are getting created and destroyed at a faster rate.;social, internet age, AI, AI impact, job market
13
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Quantum computing promises to revolutionize numerous fields by providing unprecedented computational power, potentially solving problems deemed unsolvable by classical computers.;quantum computing, computational power, revolutionize, classical computers, unsolvable problems
14
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Climate change is an urgent global issue, requiring immediate and sustained efforts to reduce greenhouse gas emissions and mitigate the impact on ecosystems and human societies.;climate change, global issue, greenhouse gases, mitigation, ecosystems, human societies
15
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The discovery of CRISPR-Cas9 has opened new frontiers in genetic engineering, allowing precise editing of DNA and presenting opportunities for advancements in medicine and agriculture.;CRISPR-Cas9, genetic engineering, DNA editing, medical advancements, agricultural advancements
16
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Machine learning algorithms are increasingly being used in finance to predict market trends, optimize investment strategies, and enhance risk management practices.;machine learning, finance, market prediction, investment strategies, risk management
17
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The study of dark matter and dark energy is crucial in understanding the composition and expansion of the universe, as they constitute the majority of its mass-energy content.;dark matter, dark energy, universe composition, expansion, mass-energy content
18
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The field of renewable energy technologies, including solar, wind, and hydropower, is essential for sustainable development and reducing dependence on fossil fuels.;renewable energy, solar power, wind power, hydropower, sustainable development, fossil fuels
19
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in neuroscience are shedding light on brain function, offering insights into neurological disorders and potential therapeutic interventions.;neuroscience, brain function, neurological disorders, therapeutic interventions
20
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The advent of 5G technology promises to enhance connectivity and enable a range of new applications, from smart cities to autonomous vehicles, revolutionizing daily life.;5G technology, connectivity, smart cities, autonomous vehicles, daily life revolution
21
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Big data analytics is transforming healthcare by providing insights from large datasets, improving patient outcomes, and enabling personalized medicine.;big data analytics, healthcare, large datasets, patient outcomes, personalized medicine
22
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The advancement of electric vehicles (EVs) is a significant step towards sustainable transportation. EVs reduce greenhouse gas emissions and reliance on fossil fuels. Innovations in battery technology and charging infrastructure are crucial to the widespread adoption of EVs.;electric vehicles, sustainable transportation, greenhouse gas emissions, fossil fuels, battery technology, charging infrastructure
23
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The study of human microbiomes is revolutionizing our understanding of health and disease. Microbiome research uncovers the complex interactions between microbes and their human hosts, leading to new insights into conditions such as obesity, diabetes, and mental health disorders.;human microbiomes, health, disease, microbiome research, microbes, human hosts, obesity, diabetes, mental health disorders
24
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence in agriculture is enhancing crop management and productivity. AI-driven technologies like precision farming and predictive analytics help farmers optimize resource use, increase yields, and improve sustainability.;AI, agriculture, crop management, productivity, precision farming, predictive analytics, resource optimization, yields, sustainability
25
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The field of nanotechnology is opening new frontiers in medicine. Nanoscale materials and devices are being developed for targeted drug delivery, improved imaging, and early disease detection. These innovations hold promise for more effective and less invasive medical treatments.;nanotechnology, medicine, nanoscale materials, targeted drug delivery, imaging, disease detection, medical treatments
26
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in renewable energy storage systems, such as batteries and supercapacitors, are addressing the intermittency of solar and wind power. Improved energy storage solutions are key to integrating renewable energy into the grid and ensuring a stable power supply.;renewable energy storage, batteries, supercapacitors, solar power, wind power, energy storage solutions, power grid, stable power supply
27
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The application of machine learning in natural language processing (NLP) is transforming how computers understand and generate human language. NLP technologies are used in various applications, including translation, sentiment analysis, and chatbots, improving human-computer interaction.;machine learning, natural language processing, NLP, human language, translation, sentiment analysis, chatbots, human-computer interaction
28
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: CRISPR technology continues to make headlines with its potential to edit genes with precision. This groundbreaking technology is being explored for treating genetic disorders, enhancing crop resilience, and even combating climate change by altering the genes of carbon-sequestering plants.;CRISPR, gene editing, genetic disorders, crop resilience, climate change, carbon-sequestering plants
29
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The proliferation of Internet of Things (IoT) devices is reshaping various industries. IoT technology enables the collection and analysis of vast amounts of data, leading to smarter homes, efficient industrial processes, and improved healthcare outcomes.;Internet of Things, IoT, data collection, data analysis, smart homes, industrial processes, healthcare outcomes
30
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of the deep sea is revealing new species and ecosystems. Advanced submersibles and remote-operated vehicles (ROVs) allow scientists to study previously inaccessible ocean depths, contributing to our knowledge of marine biology and environmental conservation.;deep sea exploration, new species, ecosystems, submersibles, remote-operated vehicles, ROVs, marine biology, environmental conservation
31
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Blockchain technology is being adopted in the financial sector to enhance security and transparency. Cryptocurrencies, smart contracts, and decentralized finance (DeFi) platforms are revolutionizing the way financial transactions are conducted and recorded.;blockchain, financial sector, security, transparency, cryptocurrencies, smart contracts, decentralized finance, DeFi, financial transactions
32
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Cultural anthropology studies human societies and cultures, examining customs, social structures, and cultural evolution.;cultural anthropology, human societies, cultures, customs, social structures, cultural evolution
33
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Shakespeare's plays are renowned for their exploration of themes such as ambition, betrayal, and the human condition, making them enduring classics in literature.;Shakespeare, plays, ambition, betrayal, human condition, literature classics
34
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Ecological restoration involves rehabilitating ecosystems that have been degraded, damaged, or destroyed, aiming to return them to a healthy and functional state.;ecological restoration, ecosystem rehabilitation, degradation, damage, functional state
35
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Behavioral psychology focuses on the study of observable behavior and its modification through conditioning, reinforcement, and learning processes.;behavioral psychology, observable behavior, conditioning, reinforcement, learning processes
36
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Historical archaeology examines the material remains of past societies to understand historical developments and cultural changes.;historical archaeology, material remains, past societies, historical developments, cultural changes
37
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Renewable agriculture promotes sustainable farming practices that reduce environmental impact and improve soil health through crop rotation, organic farming, and conservation tillage.;renewable agriculture, sustainable farming, environmental impact, soil health, crop rotation, organic farming, conservation tillage
38
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Public health policy addresses the planning and implementation of measures to improve community health, including vaccination programs and health education.;public health policy, community health, vaccination programs, health education, planning, implementation
39
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Medieval history explores the social, political, and cultural aspects of the Middle Ages, including feudalism, the Crusades, and the Black Death.;medieval history, Middle Ages, feudalism, Crusades, Black Death, social aspects, political aspects, cultural aspects
40
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Quantum mechanics is a fundamental theory in physics that describes the behavior of particles at the atomic and subatomic levels.;quantum mechanics, fundamental theory, physics, particle behavior, atomic level, subatomic level
41
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Sociolinguistics examines how language use varies and changes in social contexts, including factors like region, class, and gender.;sociolinguistics, language use, social contexts, region, class, gender, language variation
42
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Hydraulic engineering involves the design and management of water resources, including flood control, irrigation systems, and hydropower generation.;hydraulic engineering, water resources, flood control, irrigation systems, hydropower generation
43
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The Renaissance period was marked by a revival of art, literature, and learning in Europe, significantly influenced by figures like Leonardo da Vinci and Michelangelo.;Renaissance period, art revival, literature, learning, Europe, Leonardo da Vinci, Michelangelo
44
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Environmental ethics considers the moral relationship between humans and the natural world, focusing on issues like conservation, biodiversity, and sustainability.;environmental ethics, moral relationship, natural world, conservation, biodiversity, sustainability
45
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Game theory studies strategic interactions where the outcome for each participant depends on the actions of others, with applications in economics, politics, and biology.;game theory, strategic interactions, outcome dependence, economics, politics, biology
46
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Paleontology investigates the history of life on Earth through the study of fossils, providing insights into ancient ecosystems and evolutionary processes.;paleontology, history of life, fossils, ancient ecosystems, evolutionary processes
47
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Neuropsychology explores the relationship between brain function and behavior, aiding in the diagnosis and treatment of neurological and psychological disorders.;neuropsychology, brain function, behavior, diagnosis, treatment, neurological disorders, psychological disorders
48
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Classical music, spanning from the Baroque to the Romantic period, includes works by composers like Bach, Mozart, and Beethoven, known for their complex compositions and enduring influence.;classical music, Baroque period, Romantic period, Bach, Mozart, Beethoven, complex compositions, enduring influence
49
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Urban sociology studies the social structures, interactions, and experiences within cities, focusing on issues like urbanization, segregation, and community dynamics.;urban sociology, social structures, interactions, cities, urbanization, segregation, community dynamics
50
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Renewable natural resources management involves sustainable practices in utilizing forests, fisheries, and wildlife to ensure their availability for future generations.;renewable natural resources, sustainable practices, forests, fisheries, wildlife, availability, future generations
51
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Bioinformatics applies computational methods to analyze and interpret biological data, advancing research in areas like genomics and proteomics.;bioinformatics, computational methods, biological data analysis, genomics, proteomics
52
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Social media studies examine the impact of platforms like Facebook, Twitter, and Instagram on communication, social behavior, and public opinion.;social media studies, platform impact, communication, social behavior, public opinion, Facebook, Twitter, Instagram
53
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Modernist literature, emerging in the late 19th and early 20th centuries, is characterized by a break with traditional forms and a focus on stream-of-consciousness narrative and fragmented structure.;modernist literature, traditional forms, stream-of-consciousness, narrative, fragmented structure, 19th century, 20th century
54
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Human rights law focuses on protecting individual freedoms and ensuring justice, addressing issues like discrimination, freedom of speech, and fair trial.;human rights law, individual freedoms, justice, discrimination, freedom of speech, fair trial
55
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Biomedical engineering integrates principles from biology and engineering to develop technologies for healthcare, including medical devices and diagnostic tools.;biomedical engineering, biology, engineering, healthcare technologies, medical devices, diagnostic tools
56
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Political philosophy explores concepts like justice, power, and the role of government, contributing to the development of political systems and ideologies.;political philosophy, justice, power, government role, political systems, ideologies
57
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Renewable energy policy involves the formulation of regulations and incentives to promote the adoption of renewable energy sources and reduce reliance on fossil fuels.;renewable energy policy, regulations, incentives, renewable energy adoption, fossil fuels reduction
58
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Environmental chemistry studies the chemical processes that occur in natural environments, including the effects of pollutants and the behavior of chemical compounds in ecosystems.;environmental chemistry, chemical processes, natural environments, pollutants, chemical compounds, ecosystems
59
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence ethics addresses the moral implications of AI development and deployment, including issues like bias, accountability, and societal impact.;AI ethics, moral implications, AI development, deployment, bias, accountability, societal impact
60
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Architectural engineering focuses on the design and construction of buildings, integrating principles of structural engineering, environmental systems, and aesthetic considerations.;architectural engineering, building design, construction, structural engineering, environmental systems, aesthetic considerations
61
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Marine conservation aims to protect and restore ocean ecosystems, addressing threats like overfishing, pollution, and habitat destruction.;marine conservation, ocean ecosystems, protection, restoration, overfishing, pollution, habitat destruction
Extended-Keyword-dataset.csv ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Inputs,Keywords
2
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The integration of blockchain technology in supply chain management offers transparency and traceability. Companies can track products in real-time, reducing fraud and ensuring authenticity. This innovation also streamlines operations by automating transactions and improving data accuracy.","blockchain, supply chain management, transparency, traceability, real-time tracking, fraud reduction, authenticity, automated transactions, data accuracy"
3
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence and machine learning are transforming the healthcare industry. AI-powered diagnostic tools are enhancing accuracy and speed in disease detection, while machine learning models predict patient outcomes and personalize treatment plans. This shift towards digital health is also driving innovations in telemedicine and remote monitoring.","AI, machine learning, healthcare, diagnostic tools, disease detection, patient outcomes, personalized treatment, digital health, telemedicine, remote monitoring"
4
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Renewable energy sources such as solar, wind, and geothermal are critical in combating climate change. These technologies reduce reliance on fossil fuels, decrease greenhouse gas emissions, and promote sustainable development. Governments and organizations worldwide are investing heavily in renewable energy infrastructure to ensure a greener future.","renewable energy, solar power, wind power, geothermal energy, climate change, fossil fuels, greenhouse gas emissions, sustainable development, energy infrastructure"
5
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: the realm of education, online learning platforms have revolutionized access to knowledge. E-learning tools offer flexibility and personalized learning experiences, catering to diverse needs of students. This digital transformation in education is bridging gaps, providing opportunities for lifelong learning, and fostering global collaboration.","online learning, education, e-learning tools, flexibility, personalized learning, digital transformation, lifelong learning, global collaboration"
6
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of Mars has gained momentum with recent advancements in space technology. Robotic missions have successfully landed on the Martian surface, collecting valuable data about its geology and potential for life. These endeavors are laying the groundwork for future human exploration and potential colonization of Mars.","Mars exploration, space technology, robotic missions, Martian geology, potential for life, human exploration, Mars colonization"
7
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: Cybersecurity remains a top priority as digital threats evolve. Organizations are investing in advanced security measures to protect sensitive data from cyber-attacks. Innovations in cryptography and AI-based threat detection are crucial in safeguarding information and maintaining privacy in an increasingly connected world.,"cybersecurity, digital threats, advanced security, sensitive data, cyber-attacks, cryptography, AI-based threat detection, information safeguarding, privacy"
8
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The field of biotechnology is witnessing rapid growth with breakthroughs in genetic research. Techniques like CRISPR and gene therapy are opening new avenues for treating genetic disorders. These advancements are not only revolutionizing medicine but also have significant implications for agriculture and environmental conservation.,"biotechnology, genetic research, CRISPR, gene therapy, genetic disorders, medical advancements, agriculture, environmental conservation"
9
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Urbanization and smart city initiatives are shaping the future of urban living. Technologies such as IoT, AI, and big data are enhancing urban infrastructure, improving resource management, and providing better services to citizens. Smart cities aim to create sustainable and efficient urban environments for the growing population.","urbanization, smart cities, IoT, AI, big data, urban infrastructure, resource management, citizen services, sustainable urban environments"
10
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The impact of social media on communication and society is profound. Platforms like Facebook, Twitter, and Instagram have changed the way people interact, share information, and form communities. While social media fosters connectivity, it also raises concerns about privacy, misinformation, and mental health.","social media, communication, society, Facebook, Twitter, Instagram, interaction, information sharing, online communities, privacy, misinformation, mental health"
11
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Advances in quantum physics are paving the way for quantum computing. Quantum computers leverage the principles of quantum mechanics to perform complex calculations at unprecedented speeds. This technology has the potential to revolutionize fields such as cryptography, materials science, and artificial intelligence.","quantum physics, quantum computing, quantum mechanics, complex calculations, cryptography, materials science, artificial intelligence"
12
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The world around us is evolving with the new age of artificial intelligence, and this has changed the way we think and work. Jobs are getting created and destroyed at a faster rate.","social, internet age, AI, AI impact, job market"
13
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Quantum computing promises to revolutionize numerous fields by providing unprecedented computational power, potentially solving problems deemed unsolvable by classical computers.","quantum computing, computational power, revolutionize, classical computers, unsolvable problems"
14
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Climate change is an urgent global issue, requiring immediate and sustained efforts to reduce greenhouse gas emissions and mitigate the impact on ecosystems and human societies.","climate change, global issue, greenhouse gases, mitigation, ecosystems, human societies"
15
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The discovery of CRISPR-Cas9 has opened new frontiers in genetic engineering, allowing precise editing of DNA and presenting opportunities for advancements in medicine and agriculture.","CRISPR-Cas9, genetic engineering, DNA editing, medical advancements, agricultural advancements"
16
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Machine learning algorithms are increasingly being used in finance to predict market trends, optimize investment strategies, and enhance risk management practices.","machine learning, finance, market prediction, investment strategies, risk management"
17
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The study of dark matter and dark energy is crucial in understanding the composition and expansion of the universe, as they constitute the majority of its mass-energy content.","dark matter, dark energy, universe composition, expansion, mass-energy content"
18
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The field of renewable energy technologies, including solar, wind, and hydropower, is essential for sustainable development and reducing dependence on fossil fuels.","renewable energy, solar power, wind power, hydropower, sustainable development, fossil fuels"
19
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in neuroscience are shedding light on brain function, offering insights into neurological disorders and potential therapeutic interventions.","neuroscience, brain function, neurological disorders, therapeutic interventions"
20
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The advent of 5G technology promises to enhance connectivity and enable a range of new applications, from smart cities to autonomous vehicles, revolutionizing daily life.","5G technology, connectivity, smart cities, autonomous vehicles, daily life revolution"
21
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Big data analytics is transforming healthcare by providing insights from large datasets, improving patient outcomes, and enabling personalized medicine.","big data analytics, healthcare, large datasets, patient outcomes, personalized medicine"
22
+ 5-15 keyword for the following paragraph to summarise the whole academic scope: The advancement of electric vehicles (EVs) is a significant step towards sustainable transportation. EVs reduce greenhouse gas emissions and reliance on fossil fuels. Innovations in battery technology and charging infrastructure are crucial to the widespread adoption of EVs.,"electric vehicles, sustainable transportation, greenhouse gas emissions, fossil fuels, battery technology, charging infrastructure"
23
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The study of human microbiomes is revolutionizing our understanding of health and disease. Microbiome research uncovers the complex interactions between microbes and their human hosts, leading to new insights into conditions such as obesity, diabetes, and mental health disorders.","human microbiomes, health, disease, microbiome research, microbes, human hosts, obesity, diabetes, mental health disorders"
24
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Artificial intelligence in agriculture is enhancing crop management and productivity. AI-driven technologies like precision farming and predictive analytics help farmers optimize resource use, increase yields, and improve sustainability.","AI, agriculture, crop management, productivity, precision farming, predictive analytics, resource optimization, yields, sustainability"
25
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The field of nanotechnology is opening new frontiers in medicine. Nanoscale materials and devices are being developed for targeted drug delivery, improved imaging, and early disease detection. These innovations hold promise for more effective and less invasive medical treatments.","nanotechnology, medicine, nanoscale materials, targeted drug delivery, imaging, disease detection, medical treatments"
26
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Advancements in renewable energy storage systems, such as batteries and supercapacitors, are addressing the intermittency of solar and wind power. Improved energy storage solutions are key to integrating renewable energy into the grid and ensuring a stable power supply.","renewable energy storage, batteries, supercapacitors, solar power, wind power, energy storage solutions, power grid, stable power supply"
27
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The application of machine learning in natural language processing (NLP) is transforming how computers understand and generate human language. NLP technologies are used in various applications, including translation, sentiment analysis, and chatbots, improving human-computer interaction.","machine learning, natural language processing, NLP, human language, translation, sentiment analysis, chatbots, human-computer interaction"
28
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: CRISPR technology continues to make headlines with its potential to edit genes with precision. This groundbreaking technology is being explored for treating genetic disorders, enhancing crop resilience, and even combating climate change by altering the genes of carbon-sequestering plants.","CRISPR, gene editing, genetic disorders, crop resilience, climate change, carbon-sequestering plants"
29
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The proliferation of Internet of Things (IoT) devices is reshaping various industries. IoT technology enables the collection and analysis of vast amounts of data, leading to smarter homes, efficient industrial processes, and improved healthcare outcomes.","Internet of Things, IoT, data collection, data analysis, smart homes, industrial processes, healthcare outcomes"
30
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: The exploration of the deep sea is revealing new species and ecosystems. Advanced submersibles and remote-operated vehicles (ROVs) allow scientists to study previously inaccessible ocean depths, contributing to our knowledge of marine biology and environmental conservation.","deep sea exploration, new species, ecosystems, submersibles, remote-operated vehicles, ROVs, marine biology, environmental conservation"
31
+ "5-15 keyword for the following paragraph to summarise the whole academic scope: Blockchain technology is being adopted in the financial sector to enhance security and transparency. Cryptocurrencies, smart contracts, and decentralized finance (DeFi) platforms are revolutionizing the way financial transactions are conducted and recorded.","blockchain, financial sector, security, transparency, cryptocurrencies, smart contracts, decentralized finance, DeFi, financial transactions"
32
+ "Cultural anthropology studies human societies and cultures, examining customs, social structures, and cultural evolution.","cultural anthropology, human societies, cultures, customs, social structures, cultural evolution"
33
+ "Shakespeare's plays are renowned for their exploration of themes such as ambition, betrayal, and the human condition, making them enduring classics in literature.","Shakespeare, plays, ambition, betrayal, human condition, literature classics"
34
+ "Ecological restoration involves rehabilitating ecosystems that have been degraded, damaged, or destroyed, aiming to return them to a healthy and functional state.","ecological restoration, ecosystem rehabilitation, degradation, damage, functional state"
35
+ "Behavioral psychology focuses on the study of observable behavior and its modification through conditioning, reinforcement, and learning processes.","behavioral psychology, observable behavior, conditioning, reinforcement, learning processes"
36
+ Historical archaeology examines the material remains of past societies to understand historical developments and cultural changes.,"historical archaeology, material remains, past societies, historical developments, cultural changes"
37
+ "Renewable agriculture promotes sustainable farming practices that reduce environmental impact and improve soil health through crop rotation, organic farming, and conservation tillage.","renewable agriculture, sustainable farming, environmental impact, soil health, crop rotation, organic farming, conservation tillage"
38
+ "Public health policy addresses the planning and implementation of measures to improve community health, including vaccination programs and health education.","public health policy, community health, vaccination programs, health education, planning, implementation"
39
+ "Medieval history explores the social, political, and cultural aspects of the Middle Ages, including feudalism, the Crusades, and the Black Death.","medieval history, Middle Ages, feudalism, Crusades, Black Death, social aspects, political aspects, cultural aspects"
40
+ Quantum mechanics is a fundamental theory in physics that describes the behavior of particles at the atomic and subatomic levels.,"quantum mechanics, fundamental theory, physics, particle behavior, atomic level, subatomic level"
41
+ "Sociolinguistics examines how language use varies and changes in social contexts, including factors like region, class, and gender.","sociolinguistics, language use, social contexts, region, class, gender, language variation"
42
+ "Hydraulic engineering involves the design and management of water resources, including flood control, irrigation systems, and hydropower generation.","hydraulic engineering, water resources, flood control, irrigation systems, hydropower generation"
43
+ "The Renaissance period was marked by a revival of art, literature, and learning in Europe, significantly influenced by figures like Leonardo da Vinci and Michelangelo.","Renaissance period, art revival, literature, learning, Europe, Leonardo da Vinci, Michelangelo"
44
+ "Environmental ethics considers the moral relationship between humans and the natural world, focusing on issues like conservation, biodiversity, and sustainability.","environmental ethics, moral relationship, natural world, conservation, biodiversity, sustainability"
45
+ "Game theory studies strategic interactions where the outcome for each participant depends on the actions of others, with applications in economics, politics, and biology.","game theory, strategic interactions, outcome dependence, economics, politics, biology"
46
+ "Paleontology investigates the history of life on Earth through the study of fossils, providing insights into ancient ecosystems and evolutionary processes.","paleontology, history of life, fossils, ancient ecosystems, evolutionary processes"
47
+ "Neuropsychology explores the relationship between brain function and behavior, aiding in the diagnosis and treatment of neurological and psychological disorders.","neuropsychology, brain function, behavior, diagnosis, treatment, neurological disorders, psychological disorders"
48
+ "Classical music, spanning from the Baroque to the Romantic period, includes works by composers like Bach, Mozart, and Beethoven, known for their complex compositions and enduring influence.","classical music, Baroque period, Romantic period, Bach, Mozart, Beethoven, complex compositions, enduring influence"
49
+ "Urban sociology studies the social structures, interactions, and experiences within cities, focusing on issues like urbanization, segregation, and community dynamics.","urban sociology, social structures, interactions, cities, urbanization, segregation, community dynamics"
50
+ "Renewable natural resources management involves sustainable practices in utilizing forests, fisheries, and wildlife to ensure their availability for future generations.","renewable natural resources, sustainable practices, forests, fisheries, wildlife, availability, future generations"
51
+ "Bioinformatics applies computational methods to analyze and interpret biological data, advancing research in areas like genomics and proteomics.","bioinformatics, computational methods, biological data analysis, genomics, proteomics"
52
+ "Social media studies examine the impact of platforms like Facebook, Twitter, and Instagram on communication, social behavior, and public opinion.","social media studies, platform impact, communication, social behavior, public opinion, Facebook, Twitter, Instagram"
53
+ "Modernist literature, emerging in the late 19th and early 20th centuries, is characterized by a break with traditional forms and a focus on stream-of-consciousness narrative and fragmented structure.","modernist literature, traditional forms, stream-of-consciousness, narrative, fragmented structure, 19th century, 20th century"
54
+ "Human rights law focuses on protecting individual freedoms and ensuring justice, addressing issues like discrimination, freedom of speech, and fair trial.","human rights law, individual freedoms, justice, discrimination, freedom of speech, fair trial"
55
+ "Biomedical engineering integrates principles from biology and engineering to develop technologies for healthcare, including medical devices and diagnostic tools.","biomedical engineering, biology, engineering, healthcare technologies, medical devices, diagnostic tools"
56
+ "Political philosophy explores concepts like justice, power, and the role of government, contributing to the development of political systems and ideologies.","political philosophy, justice, power, government role, political systems, ideologies"
57
+ Renewable energy policy involves the formulation of regulations and incentives to promote the adoption of renewable energy sources and reduce reliance on fossil fuels.,"renewable energy policy, regulations, incentives, renewable energy adoption, fossil fuels reduction"
58
+ "Environmental chemistry studies the chemical processes that occur in natural environments, including the effects of pollutants and the behavior of chemical compounds in ecosystems.","environmental chemistry, chemical processes, natural environments, pollutants, chemical compounds, ecosystems"
59
+ "Artificial intelligence ethics addresses the moral implications of AI development and deployment, including issues like bias, accountability, and societal impact.","AI ethics, moral implications, AI development, deployment, bias, accountability, societal impact"
60
+ "Architectural engineering focuses on the design and construction of buildings, integrating principles of structural engineering, environmental systems, and aesthetic considerations.","architectural engineering, building design, construction, structural engineering, environmental systems, aesthetic considerations"
61
+ "Marine conservation aims to protect and restore ocean ecosystems, addressing threats like overfishing, pollution, and habitat destruction.","marine conservation, ocean ecosystems, protection, restoration, overfishing, pollution, habitat destruction"
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 11,
4
+ "eos_token_id": 11,
5
+ "transformers_version": "4.31.0.dev0"
6
+ }
handler.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+
3
+ from typing import Any, Dict
4
+ from transformers import AutoModelForCausalLM, AutoTokenizer
5
+
6
+
7
+ class EndpointHandler:
8
+ def __init__(self, path=""):
9
+ # load model and tokenizer from path
10
+ self.tokenizer = AutoTokenizer.from_pretrained(path)
11
+ self.model = AutoModelForCausalLM.from_pretrained(
12
+ path, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True
13
+ )
14
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
15
+
16
+ def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
17
+ # process input
18
+ inputs = data.pop("inputs", data)
19
+ parameters = data.pop("parameters", None)
20
+
21
+ # preprocess
22
+ inputs = self.tokenizer(inputs, return_tensors="pt").to(self.device)
23
+
24
+ # pass inputs with all kwargs in data
25
+ if parameters is not None:
26
+ outputs = self.model.generate(**inputs, **parameters)
27
+ else:
28
+ outputs = self.model.generate(**inputs)
29
+
30
+ # postprocess the prediction
31
+ prediction = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
32
+
33
+ return [{"generated_text": prediction}]
model.safetensors.index.json ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 27686882816
4
+ },
5
+ "weight_map": {
6
+ "transformer.h.0.input_layernorm.bias": "model-00001-of-00015.safetensors",
7
+ "transformer.h.0.input_layernorm.weight": "model-00001-of-00015.safetensors",
8
+ "transformer.h.0.mlp.dense_4h_to_h.weight": "model-00002-of-00015.safetensors",
9
+ "transformer.h.0.mlp.dense_h_to_4h.weight": "model-00001-of-00015.safetensors",
10
+ "transformer.h.0.self_attention.dense.weight": "model-00001-of-00015.safetensors",
11
+ "transformer.h.0.self_attention.query_key_value.weight": "model-00001-of-00015.safetensors",
12
+ "transformer.h.1.input_layernorm.bias": "model-00002-of-00015.safetensors",
13
+ "transformer.h.1.input_layernorm.weight": "model-00002-of-00015.safetensors",
14
+ "transformer.h.1.mlp.dense_4h_to_h.weight": "model-00002-of-00015.safetensors",
15
+ "transformer.h.1.mlp.dense_h_to_4h.weight": "model-00002-of-00015.safetensors",
16
+ "transformer.h.1.self_attention.dense.weight": "model-00002-of-00015.safetensors",
17
+ "transformer.h.1.self_attention.query_key_value.weight": "model-00002-of-00015.safetensors",
18
+ "transformer.h.10.input_layernorm.bias": "model-00005-of-00015.safetensors",
19
+ "transformer.h.10.input_layernorm.weight": "model-00005-of-00015.safetensors",
20
+ "transformer.h.10.mlp.dense_4h_to_h.weight": "model-00006-of-00015.safetensors",
21
+ "transformer.h.10.mlp.dense_h_to_4h.weight": "model-00006-of-00015.safetensors",
22
+ "transformer.h.10.self_attention.dense.weight": "model-00006-of-00015.safetensors",
23
+ "transformer.h.10.self_attention.query_key_value.weight": "model-00006-of-00015.safetensors",
24
+ "transformer.h.11.input_layernorm.bias": "model-00006-of-00015.safetensors",
25
+ "transformer.h.11.input_layernorm.weight": "model-00006-of-00015.safetensors",
26
+ "transformer.h.11.mlp.dense_4h_to_h.weight": "model-00006-of-00015.safetensors",
27
+ "transformer.h.11.mlp.dense_h_to_4h.weight": "model-00006-of-00015.safetensors",
28
+ "transformer.h.11.self_attention.dense.weight": "model-00006-of-00015.safetensors",
29
+ "transformer.h.11.self_attention.query_key_value.weight": "model-00006-of-00015.safetensors",
30
+ "transformer.h.12.input_layernorm.bias": "model-00006-of-00015.safetensors",
31
+ "transformer.h.12.input_layernorm.weight": "model-00006-of-00015.safetensors",
32
+ "transformer.h.12.mlp.dense_4h_to_h.weight": "model-00007-of-00015.safetensors",
33
+ "transformer.h.12.mlp.dense_h_to_4h.weight": "model-00007-of-00015.safetensors",
34
+ "transformer.h.12.self_attention.dense.weight": "model-00006-of-00015.safetensors",
35
+ "transformer.h.12.self_attention.query_key_value.weight": "model-00006-of-00015.safetensors",
36
+ "transformer.h.13.input_layernorm.bias": "model-00007-of-00015.safetensors",
37
+ "transformer.h.13.input_layernorm.weight": "model-00007-of-00015.safetensors",
38
+ "transformer.h.13.mlp.dense_4h_to_h.weight": "model-00007-of-00015.safetensors",
39
+ "transformer.h.13.mlp.dense_h_to_4h.weight": "model-00007-of-00015.safetensors",
40
+ "transformer.h.13.self_attention.dense.weight": "model-00007-of-00015.safetensors",
41
+ "transformer.h.13.self_attention.query_key_value.weight": "model-00007-of-00015.safetensors",
42
+ "transformer.h.14.input_layernorm.bias": "model-00007-of-00015.safetensors",
43
+ "transformer.h.14.input_layernorm.weight": "model-00007-of-00015.safetensors",
44
+ "transformer.h.14.mlp.dense_4h_to_h.weight": "model-00008-of-00015.safetensors",
45
+ "transformer.h.14.mlp.dense_h_to_4h.weight": "model-00007-of-00015.safetensors",
46
+ "transformer.h.14.self_attention.dense.weight": "model-00007-of-00015.safetensors",
47
+ "transformer.h.14.self_attention.query_key_value.weight": "model-00007-of-00015.safetensors",
48
+ "transformer.h.15.input_layernorm.bias": "model-00008-of-00015.safetensors",
49
+ "transformer.h.15.input_layernorm.weight": "model-00008-of-00015.safetensors",
50
+ "transformer.h.15.mlp.dense_4h_to_h.weight": "model-00008-of-00015.safetensors",
51
+ "transformer.h.15.mlp.dense_h_to_4h.weight": "model-00008-of-00015.safetensors",
52
+ "transformer.h.15.self_attention.dense.weight": "model-00008-of-00015.safetensors",
53
+ "transformer.h.15.self_attention.query_key_value.weight": "model-00008-of-00015.safetensors",
54
+ "transformer.h.16.input_layernorm.bias": "model-00008-of-00015.safetensors",
55
+ "transformer.h.16.input_layernorm.weight": "model-00008-of-00015.safetensors",
56
+ "transformer.h.16.mlp.dense_4h_to_h.weight": "model-00008-of-00015.safetensors",
57
+ "transformer.h.16.mlp.dense_h_to_4h.weight": "model-00008-of-00015.safetensors",
58
+ "transformer.h.16.self_attention.dense.weight": "model-00008-of-00015.safetensors",
59
+ "transformer.h.16.self_attention.query_key_value.weight": "model-00008-of-00015.safetensors",
60
+ "transformer.h.17.input_layernorm.bias": "model-00008-of-00015.safetensors",
61
+ "transformer.h.17.input_layernorm.weight": "model-00008-of-00015.safetensors",
62
+ "transformer.h.17.mlp.dense_4h_to_h.weight": "model-00009-of-00015.safetensors",
63
+ "transformer.h.17.mlp.dense_h_to_4h.weight": "model-00009-of-00015.safetensors",
64
+ "transformer.h.17.self_attention.dense.weight": "model-00009-of-00015.safetensors",
65
+ "transformer.h.17.self_attention.query_key_value.weight": "model-00009-of-00015.safetensors",
66
+ "transformer.h.18.input_layernorm.bias": "model-00009-of-00015.safetensors",
67
+ "transformer.h.18.input_layernorm.weight": "model-00009-of-00015.safetensors",
68
+ "transformer.h.18.mlp.dense_4h_to_h.weight": "model-00009-of-00015.safetensors",
69
+ "transformer.h.18.mlp.dense_h_to_4h.weight": "model-00009-of-00015.safetensors",
70
+ "transformer.h.18.self_attention.dense.weight": "model-00009-of-00015.safetensors",
71
+ "transformer.h.18.self_attention.query_key_value.weight": "model-00009-of-00015.safetensors",
72
+ "transformer.h.19.input_layernorm.bias": "model-00009-of-00015.safetensors",
73
+ "transformer.h.19.input_layernorm.weight": "model-00009-of-00015.safetensors",
74
+ "transformer.h.19.mlp.dense_4h_to_h.weight": "model-00010-of-00015.safetensors",
75
+ "transformer.h.19.mlp.dense_h_to_4h.weight": "model-00010-of-00015.safetensors",
76
+ "transformer.h.19.self_attention.dense.weight": "model-00009-of-00015.safetensors",
77
+ "transformer.h.19.self_attention.query_key_value.weight": "model-00009-of-00015.safetensors",
78
+ "transformer.h.2.input_layernorm.bias": "model-00002-of-00015.safetensors",
79
+ "transformer.h.2.input_layernorm.weight": "model-00002-of-00015.safetensors",
80
+ "transformer.h.2.mlp.dense_4h_to_h.weight": "model-00002-of-00015.safetensors",
81
+ "transformer.h.2.mlp.dense_h_to_4h.weight": "model-00002-of-00015.safetensors",
82
+ "transformer.h.2.self_attention.dense.weight": "model-00002-of-00015.safetensors",
83
+ "transformer.h.2.self_attention.query_key_value.weight": "model-00002-of-00015.safetensors",
84
+ "transformer.h.20.input_layernorm.bias": "model-00010-of-00015.safetensors",
85
+ "transformer.h.20.input_layernorm.weight": "model-00010-of-00015.safetensors",
86
+ "transformer.h.20.mlp.dense_4h_to_h.weight": "model-00010-of-00015.safetensors",
87
+ "transformer.h.20.mlp.dense_h_to_4h.weight": "model-00010-of-00015.safetensors",
88
+ "transformer.h.20.self_attention.dense.weight": "model-00010-of-00015.safetensors",
89
+ "transformer.h.20.self_attention.query_key_value.weight": "model-00010-of-00015.safetensors",
90
+ "transformer.h.21.input_layernorm.bias": "model-00010-of-00015.safetensors",
91
+ "transformer.h.21.input_layernorm.weight": "model-00010-of-00015.safetensors",
92
+ "transformer.h.21.mlp.dense_4h_to_h.weight": "model-00011-of-00015.safetensors",
93
+ "transformer.h.21.mlp.dense_h_to_4h.weight": "model-00010-of-00015.safetensors",
94
+ "transformer.h.21.self_attention.dense.weight": "model-00010-of-00015.safetensors",
95
+ "transformer.h.21.self_attention.query_key_value.weight": "model-00010-of-00015.safetensors",
96
+ "transformer.h.22.input_layernorm.bias": "model-00011-of-00015.safetensors",
97
+ "transformer.h.22.input_layernorm.weight": "model-00011-of-00015.safetensors",
98
+ "transformer.h.22.mlp.dense_4h_to_h.weight": "model-00011-of-00015.safetensors",
99
+ "transformer.h.22.mlp.dense_h_to_4h.weight": "model-00011-of-00015.safetensors",
100
+ "transformer.h.22.self_attention.dense.weight": "model-00011-of-00015.safetensors",
101
+ "transformer.h.22.self_attention.query_key_value.weight": "model-00011-of-00015.safetensors",
102
+ "transformer.h.23.input_layernorm.bias": "model-00011-of-00015.safetensors",
103
+ "transformer.h.23.input_layernorm.weight": "model-00011-of-00015.safetensors",
104
+ "transformer.h.23.mlp.dense_4h_to_h.weight": "model-00011-of-00015.safetensors",
105
+ "transformer.h.23.mlp.dense_h_to_4h.weight": "model-00011-of-00015.safetensors",
106
+ "transformer.h.23.self_attention.dense.weight": "model-00011-of-00015.safetensors",
107
+ "transformer.h.23.self_attention.query_key_value.weight": "model-00011-of-00015.safetensors",
108
+ "transformer.h.24.input_layernorm.bias": "model-00011-of-00015.safetensors",
109
+ "transformer.h.24.input_layernorm.weight": "model-00011-of-00015.safetensors",
110
+ "transformer.h.24.mlp.dense_4h_to_h.weight": "model-00012-of-00015.safetensors",
111
+ "transformer.h.24.mlp.dense_h_to_4h.weight": "model-00012-of-00015.safetensors",
112
+ "transformer.h.24.self_attention.dense.weight": "model-00012-of-00015.safetensors",
113
+ "transformer.h.24.self_attention.query_key_value.weight": "model-00012-of-00015.safetensors",
114
+ "transformer.h.25.input_layernorm.bias": "model-00012-of-00015.safetensors",
115
+ "transformer.h.25.input_layernorm.weight": "model-00012-of-00015.safetensors",
116
+ "transformer.h.25.mlp.dense_4h_to_h.weight": "model-00012-of-00015.safetensors",
117
+ "transformer.h.25.mlp.dense_h_to_4h.weight": "model-00012-of-00015.safetensors",
118
+ "transformer.h.25.self_attention.dense.weight": "model-00012-of-00015.safetensors",
119
+ "transformer.h.25.self_attention.query_key_value.weight": "model-00012-of-00015.safetensors",
120
+ "transformer.h.26.input_layernorm.bias": "model-00012-of-00015.safetensors",
121
+ "transformer.h.26.input_layernorm.weight": "model-00012-of-00015.safetensors",
122
+ "transformer.h.26.mlp.dense_4h_to_h.weight": "model-00013-of-00015.safetensors",
123
+ "transformer.h.26.mlp.dense_h_to_4h.weight": "model-00013-of-00015.safetensors",
124
+ "transformer.h.26.self_attention.dense.weight": "model-00012-of-00015.safetensors",
125
+ "transformer.h.26.self_attention.query_key_value.weight": "model-00012-of-00015.safetensors",
126
+ "transformer.h.27.input_layernorm.bias": "model-00013-of-00015.safetensors",
127
+ "transformer.h.27.input_layernorm.weight": "model-00013-of-00015.safetensors",
128
+ "transformer.h.27.mlp.dense_4h_to_h.weight": "model-00013-of-00015.safetensors",
129
+ "transformer.h.27.mlp.dense_h_to_4h.weight": "model-00013-of-00015.safetensors",
130
+ "transformer.h.27.self_attention.dense.weight": "model-00013-of-00015.safetensors",
131
+ "transformer.h.27.self_attention.query_key_value.weight": "model-00013-of-00015.safetensors",
132
+ "transformer.h.28.input_layernorm.bias": "model-00013-of-00015.safetensors",
133
+ "transformer.h.28.input_layernorm.weight": "model-00013-of-00015.safetensors",
134
+ "transformer.h.28.mlp.dense_4h_to_h.weight": "model-00014-of-00015.safetensors",
135
+ "transformer.h.28.mlp.dense_h_to_4h.weight": "model-00013-of-00015.safetensors",
136
+ "transformer.h.28.self_attention.dense.weight": "model-00013-of-00015.safetensors",
137
+ "transformer.h.28.self_attention.query_key_value.weight": "model-00013-of-00015.safetensors",
138
+ "transformer.h.29.input_layernorm.bias": "model-00014-of-00015.safetensors",
139
+ "transformer.h.29.input_layernorm.weight": "model-00014-of-00015.safetensors",
140
+ "transformer.h.29.mlp.dense_4h_to_h.weight": "model-00014-of-00015.safetensors",
141
+ "transformer.h.29.mlp.dense_h_to_4h.weight": "model-00014-of-00015.safetensors",
142
+ "transformer.h.29.self_attention.dense.weight": "model-00014-of-00015.safetensors",
143
+ "transformer.h.29.self_attention.query_key_value.weight": "model-00014-of-00015.safetensors",
144
+ "transformer.h.3.input_layernorm.bias": "model-00002-of-00015.safetensors",
145
+ "transformer.h.3.input_layernorm.weight": "model-00002-of-00015.safetensors",
146
+ "transformer.h.3.mlp.dense_4h_to_h.weight": "model-00003-of-00015.safetensors",
147
+ "transformer.h.3.mlp.dense_h_to_4h.weight": "model-00003-of-00015.safetensors",
148
+ "transformer.h.3.self_attention.dense.weight": "model-00003-of-00015.safetensors",
149
+ "transformer.h.3.self_attention.query_key_value.weight": "model-00003-of-00015.safetensors",
150
+ "transformer.h.30.input_layernorm.bias": "model-00014-of-00015.safetensors",
151
+ "transformer.h.30.input_layernorm.weight": "model-00014-of-00015.safetensors",
152
+ "transformer.h.30.mlp.dense_4h_to_h.weight": "model-00014-of-00015.safetensors",
153
+ "transformer.h.30.mlp.dense_h_to_4h.weight": "model-00014-of-00015.safetensors",
154
+ "transformer.h.30.self_attention.dense.weight": "model-00014-of-00015.safetensors",
155
+ "transformer.h.30.self_attention.query_key_value.weight": "model-00014-of-00015.safetensors",
156
+ "transformer.h.31.input_layernorm.bias": "model-00014-of-00015.safetensors",
157
+ "transformer.h.31.input_layernorm.weight": "model-00014-of-00015.safetensors",
158
+ "transformer.h.31.mlp.dense_4h_to_h.weight": "model-00015-of-00015.safetensors",
159
+ "transformer.h.31.mlp.dense_h_to_4h.weight": "model-00015-of-00015.safetensors",
160
+ "transformer.h.31.self_attention.dense.weight": "model-00015-of-00015.safetensors",
161
+ "transformer.h.31.self_attention.query_key_value.weight": "model-00015-of-00015.safetensors",
162
+ "transformer.h.4.input_layernorm.bias": "model-00003-of-00015.safetensors",
163
+ "transformer.h.4.input_layernorm.weight": "model-00003-of-00015.safetensors",
164
+ "transformer.h.4.mlp.dense_4h_to_h.weight": "model-00003-of-00015.safetensors",
165
+ "transformer.h.4.mlp.dense_h_to_4h.weight": "model-00003-of-00015.safetensors",
166
+ "transformer.h.4.self_attention.dense.weight": "model-00003-of-00015.safetensors",
167
+ "transformer.h.4.self_attention.query_key_value.weight": "model-00003-of-00015.safetensors",
168
+ "transformer.h.5.input_layernorm.bias": "model-00003-of-00015.safetensors",
169
+ "transformer.h.5.input_layernorm.weight": "model-00003-of-00015.safetensors",
170
+ "transformer.h.5.mlp.dense_4h_to_h.weight": "model-00004-of-00015.safetensors",
171
+ "transformer.h.5.mlp.dense_h_to_4h.weight": "model-00004-of-00015.safetensors",
172
+ "transformer.h.5.self_attention.dense.weight": "model-00003-of-00015.safetensors",
173
+ "transformer.h.5.self_attention.query_key_value.weight": "model-00003-of-00015.safetensors",
174
+ "transformer.h.6.input_layernorm.bias": "model-00004-of-00015.safetensors",
175
+ "transformer.h.6.input_layernorm.weight": "model-00004-of-00015.safetensors",
176
+ "transformer.h.6.mlp.dense_4h_to_h.weight": "model-00004-of-00015.safetensors",
177
+ "transformer.h.6.mlp.dense_h_to_4h.weight": "model-00004-of-00015.safetensors",
178
+ "transformer.h.6.self_attention.dense.weight": "model-00004-of-00015.safetensors",
179
+ "transformer.h.6.self_attention.query_key_value.weight": "model-00004-of-00015.safetensors",
180
+ "transformer.h.7.input_layernorm.bias": "model-00004-of-00015.safetensors",
181
+ "transformer.h.7.input_layernorm.weight": "model-00004-of-00015.safetensors",
182
+ "transformer.h.7.mlp.dense_4h_to_h.weight": "model-00005-of-00015.safetensors",
183
+ "transformer.h.7.mlp.dense_h_to_4h.weight": "model-00004-of-00015.safetensors",
184
+ "transformer.h.7.self_attention.dense.weight": "model-00004-of-00015.safetensors",
185
+ "transformer.h.7.self_attention.query_key_value.weight": "model-00004-of-00015.safetensors",
186
+ "transformer.h.8.input_layernorm.bias": "model-00005-of-00015.safetensors",
187
+ "transformer.h.8.input_layernorm.weight": "model-00005-of-00015.safetensors",
188
+ "transformer.h.8.mlp.dense_4h_to_h.weight": "model-00005-of-00015.safetensors",
189
+ "transformer.h.8.mlp.dense_h_to_4h.weight": "model-00005-of-00015.safetensors",
190
+ "transformer.h.8.self_attention.dense.weight": "model-00005-of-00015.safetensors",
191
+ "transformer.h.8.self_attention.query_key_value.weight": "model-00005-of-00015.safetensors",
192
+ "transformer.h.9.input_layernorm.bias": "model-00005-of-00015.safetensors",
193
+ "transformer.h.9.input_layernorm.weight": "model-00005-of-00015.safetensors",
194
+ "transformer.h.9.mlp.dense_4h_to_h.weight": "model-00005-of-00015.safetensors",
195
+ "transformer.h.9.mlp.dense_h_to_4h.weight": "model-00005-of-00015.safetensors",
196
+ "transformer.h.9.self_attention.dense.weight": "model-00005-of-00015.safetensors",
197
+ "transformer.h.9.self_attention.query_key_value.weight": "model-00005-of-00015.safetensors",
198
+ "transformer.ln_f.bias": "model-00015-of-00015.safetensors",
199
+ "transformer.ln_f.weight": "model-00015-of-00015.safetensors",
200
+ "transformer.word_embeddings.weight": "model-00001-of-00015.safetensors"
201
+ }
202
+ }
modeling_falcon.py ADDED
@@ -0,0 +1,1242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2023 the Falcon authors and HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+ """PyTorch Falcon model."""
16
+
17
+ import math
18
+ from typing import Optional, Tuple, Union
19
+
20
+ import torch
21
+ import torch.utils.checkpoint
22
+ from torch import nn
23
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, LayerNorm, MSELoss
24
+ from torch.nn import functional as F
25
+
26
+ from transformers.modeling_outputs import (
27
+ BaseModelOutputWithPastAndCrossAttentions,
28
+ CausalLMOutputWithCrossAttentions,
29
+ QuestionAnsweringModelOutput,
30
+ SequenceClassifierOutputWithPast,
31
+ TokenClassifierOutput,
32
+ )
33
+ from transformers.modeling_utils import PreTrainedModel
34
+ from transformers.utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
35
+ from .configuration_falcon import FalconConfig
36
+
37
+
38
+ logger = logging.get_logger(__name__)
39
+
40
+ FALCON_PRETRAINED_MODEL_ARCHIVE_LIST = [
41
+ "tiiuae/falcon-40b",
42
+ "tiiuae/falcon-40b-instruct",
43
+ "tiiuae/falcon-7b",
44
+ "tiiuae/falcon-7b-instruct",
45
+ "tiiuae/falcon-rw-7b",
46
+ "tiiuae/falcon-rw-1b",
47
+ ]
48
+ _CHECKPOINT_FOR_DOC = "Rocketknight1/falcon-rw-1b"
49
+ _CONFIG_FOR_DOC = "FalconConfig"
50
+
51
+
52
+ # NOTE(Hesslow): Unfortunately we did not fuse matmul and bias during training, this means that there's one additional quantization to bfloat16 between the operations.
53
+ # In order not to degrade the quality of our HF-port, we keep these characteristics in the final model.
54
+ class FalconLinear(nn.Linear):
55
+ def forward(self, input: torch.Tensor) -> torch.Tensor:
56
+ hidden_states = input @ self.weight.T
57
+ if self.bias is None:
58
+ return hidden_states
59
+ return hidden_states + self.bias
60
+
61
+
62
+ # rotary pos emb helpers (torch.jit.script does not seem to support staticmethod...)
63
+ def rotate_half(x):
64
+ x1, x2 = x[..., : x.shape[-1] // 2], x[..., x.shape[-1] // 2 :]
65
+ return torch.cat((-x2, x1), dim=-1)
66
+
67
+
68
+ class FalconRotaryEmbedding(nn.Module):
69
+ """Implementation of RotaryEmbedding from GPT-NeoX.
70
+ This implementation is designed to operate on queries and keys that are compatible with `[batch_size,
71
+ n_heads_per_partition, seq_len, head_dim]` (e.g. MinGPTAttention format).
72
+ """
73
+
74
+ def __init__(self, head_dim: int, base=10000):
75
+ super().__init__()
76
+ inv_freq = 1.0 / (base ** (torch.arange(0, head_dim, 2).float() / head_dim))
77
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
78
+ self.head_dim = head_dim
79
+ self.seq_len_cached = -1
80
+ self.cos_cached: torch.Tensor | None = None
81
+ self.sin_cached: torch.Tensor | None = None
82
+
83
+ def cos_sin(self, seq_len: int, past_key_values_length: int, device="cpu", dtype=torch.bfloat16) -> torch.Tensor:
84
+ total_length = seq_len + past_key_values_length
85
+ if total_length > self.seq_len_cached:
86
+ self.seq_len_cached = total_length
87
+ t = torch.arange(total_length, device=device, dtype=self.inv_freq.dtype)
88
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
89
+ emb = torch.cat((freqs, freqs), dim=-1).to(device)
90
+
91
+ if dtype in [torch.float16, torch.bfloat16]:
92
+ emb = emb.float()
93
+
94
+ self.cos_cached = emb.cos()[None, :, :]
95
+ self.sin_cached = emb.sin()[None, :, :]
96
+
97
+ self.cos_cached = self.cos_cached.type(dtype)
98
+ self.sin_cached = self.sin_cached.type(dtype)
99
+
100
+ return (
101
+ self.cos_cached[:, past_key_values_length : seq_len + past_key_values_length],
102
+ self.sin_cached[:, past_key_values_length : seq_len + past_key_values_length],
103
+ )
104
+
105
+ def forward(self, query, key, past_key_values_length=0):
106
+ batch, seq_len, head_dim = query.shape
107
+ cos, sin = self.cos_sin(seq_len, past_key_values_length, query.device, query.dtype)
108
+ return (query * cos) + (rotate_half(query) * sin), (key * cos) + (rotate_half(key) * sin)
109
+
110
+
111
+ def _make_causal_mask(
112
+ input_ids_shape: torch.Size, device: torch.device, past_key_values_length: int
113
+ ) -> torch.BoolTensor:
114
+ """
115
+ Make causal mask used for self-attention. This mask does not take the existing attention mask into account - it
116
+ just blocks tokens from attending forwards in the sequence. The output shape will be `[batch_size, 1,
117
+ target_length, target_length+past_key_values_length]`.
118
+ """
119
+ batch_size, target_length = input_ids_shape
120
+
121
+ mask = torch.triu(torch.ones((target_length, target_length), dtype=torch.bool, device=device), diagonal=1)
122
+ # If past_key_values_length is 0 this is an empty tensor and the concatenation is a no-op.
123
+ # This code style is an unfortunate consequence of getting your TF engineer to port models; doing it this
124
+ # way avoids a data-dependent conditional, which will help me when I have to port this to XLA later.
125
+ past_mask = torch.zeros((target_length, past_key_values_length), dtype=torch.bool, device=device)
126
+ mask = torch.cat([past_mask, mask], dim=-1)
127
+ expanded_mask = mask[None, None, :, :].expand(batch_size, 1, target_length, target_length + past_key_values_length)
128
+ return expanded_mask
129
+
130
+
131
+ def _expand_mask(mask: torch.Tensor, past_key_values_length: int) -> torch.BoolTensor:
132
+ """
133
+ Expands attention_mask from `[batch_size, seq_length]` to `[batch_size, 1, seq_length, seq_length + past_length]`.
134
+ """
135
+ batch_size, total_length = mask.shape
136
+ seq_length = total_length - past_key_values_length if past_key_values_length is not None else total_length
137
+
138
+ expanded_mask = ~(mask[:, None, None, :].to(torch.bool))
139
+ return expanded_mask.expand(batch_size, 1, seq_length, total_length)
140
+
141
+
142
+ def build_alibi_tensor(attention_mask: torch.Tensor, num_heads: int, dtype: torch.dtype) -> torch.Tensor:
143
+ batch_size, seq_length = attention_mask.shape
144
+ closest_power_of_2 = 2 ** math.floor(math.log2(num_heads))
145
+ base = torch.tensor(
146
+ 2 ** (-(2 ** -(math.log2(closest_power_of_2) - 3))), device=attention_mask.device, dtype=torch.float32
147
+ )
148
+ powers = torch.arange(1, 1 + closest_power_of_2, device=attention_mask.device, dtype=torch.int32)
149
+ slopes = torch.pow(base, powers)
150
+
151
+ if closest_power_of_2 != num_heads:
152
+ extra_base = torch.tensor(
153
+ 2 ** (-(2 ** -(math.log2(2 * closest_power_of_2) - 3))), device=attention_mask.device, dtype=torch.float32
154
+ )
155
+ num_remaining_heads = min(closest_power_of_2, num_heads - closest_power_of_2)
156
+ extra_powers = torch.arange(1, 1 + 2 * num_remaining_heads, 2, device=attention_mask.device, dtype=torch.int32)
157
+ slopes = torch.cat([slopes, torch.pow(extra_base, extra_powers)], dim=0)
158
+
159
+ # Note: alibi will added to the attention bias that will be applied to the query, key product of attention
160
+ # => therefore alibi will have to be of shape (batch_size, num_heads, query_length, key_length)
161
+ # => here we set (batch_size=1, num_heads=num_heads, query_length=1, key_length=max_length)
162
+ # => the query_length dimension will then be broadcasted correctly
163
+ # This is more or less identical to T5's relative position bias:
164
+ # https://github.com/huggingface/transformers/blob/f681437203baa7671de3174b0fa583c349d9d5e1/src/transformers/models/t5/modeling_t5.py#L527
165
+ arange_tensor = ((attention_mask.cumsum(dim=-1) - 1) * attention_mask)[:, None, :]
166
+ alibi = slopes[..., None].bfloat16() * arange_tensor
167
+ return alibi.reshape(batch_size * num_heads, 1, seq_length).to(dtype)
168
+
169
+
170
+ # Copied from transformers.models.bloom.modeling_bloom.dropout_add
171
+ def dropout_add(x: torch.Tensor, residual: torch.Tensor, prob: float, training: bool) -> torch.Tensor:
172
+ """
173
+ Dropout add function
174
+ Args:
175
+ x (`torch.tensor`, *required*):
176
+ input tensor
177
+ residual (`torch.tensor`, *required*):
178
+ residual tensor
179
+ prob (`float`, *required*):
180
+ dropout probability
181
+ training (`bool`, *required*):
182
+ training mode
183
+ """
184
+ out = F.dropout(x, p=prob, training=training)
185
+ out = residual + out
186
+ return out
187
+
188
+
189
+ class FalconAttention(nn.Module):
190
+ def __init__(self, config: FalconConfig):
191
+ super().__init__()
192
+
193
+ self.hidden_size = config.hidden_size
194
+ self.num_heads = config.num_attention_heads
195
+ self.head_dim = self.hidden_size // self.num_heads
196
+ self.split_size = self.hidden_size
197
+ self.hidden_dropout = config.hidden_dropout
198
+
199
+ if self.head_dim * self.num_heads != self.hidden_size:
200
+ raise ValueError(
201
+ f"`hidden_size` must be divisible by num_heads (got `hidden_size`: {self.hidden_size} and `num_heads`:"
202
+ f" {self.num_heads})."
203
+ )
204
+
205
+ self.maybe_rotary = FalconRotaryEmbedding(config.head_dim) if config.rotary else lambda q, k, t: (q, k)
206
+
207
+ # Layer-wise attention scaling
208
+ self.inv_norm_factor = 1.0 / math.sqrt(self.head_dim)
209
+ self.beta = self.inv_norm_factor
210
+ if config.new_decoder_architecture:
211
+ qkv_out_dim = (config.num_kv_heads * 2 + config.num_attention_heads) * self.head_dim
212
+ elif config.multi_query:
213
+ qkv_out_dim = self.hidden_size + 2 * self.head_dim
214
+ else:
215
+ qkv_out_dim = 3 * self.hidden_size
216
+ self.query_key_value = FalconLinear(self.hidden_size, qkv_out_dim, bias=config.bias)
217
+ self.new_decoder_architecture = config.new_decoder_architecture
218
+ self.multi_query = config.multi_query
219
+ self.dense = FalconLinear(self.hidden_size, self.hidden_size, bias=config.bias)
220
+ self.attention_dropout = nn.Dropout(config.attention_dropout)
221
+ self.num_kv_heads = config.num_kv_heads if (self.new_decoder_architecture or not self.multi_query) else 1
222
+
223
+ def _split_heads(self, fused_qkv: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
224
+ """
225
+ Split the last dimension into (num_heads, head_dim), results share same memory storage as `fused_qkv`
226
+ Args:
227
+ fused_qkv (`torch.tensor`, *required*): [batch_size, seq_length, num_heads * 3 * head_dim]
228
+ Returns:
229
+ query: [batch_size, seq_length, num_heads, head_dim] key: [batch_size, seq_length, num_heads, head_dim]
230
+ value: [batch_size, seq_length, num_heads, head_dim]
231
+ """
232
+ if self.new_decoder_architecture:
233
+ batch, seq_len, _ = fused_qkv.shape
234
+ qkv = fused_qkv.view(batch, seq_len, -1, self.num_heads // self.num_kv_heads + 2, self.head_dim)
235
+ query = qkv[:, :, :, :-2]
236
+ key = qkv[:, :, :, [-2]]
237
+ value = qkv[:, :, :, [-1]]
238
+ key = torch.broadcast_to(key, query.shape)
239
+ value = torch.broadcast_to(value, query.shape)
240
+
241
+ query, key, value = [x.flatten(2, 3) for x in (query, key, value)]
242
+ return query, key, value
243
+ elif not self.multi_query:
244
+ batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
245
+ fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads, 3, self.head_dim)
246
+ return fused_qkv[..., 0, :], fused_qkv[..., 1, :], fused_qkv[..., 2, :]
247
+ else:
248
+ batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
249
+ fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim)
250
+ return fused_qkv[..., :-2, :], fused_qkv[..., [-2], :], fused_qkv[..., [-1], :]
251
+
252
+ # Copied from transformers.models.bloom.modeling_bloom.BloomAttention._merge_heads
253
+ def _merge_heads(self, x: torch.Tensor) -> torch.Tensor:
254
+ """
255
+ Merge heads together over the last dimenstion
256
+ Args:
257
+ x (`torch.tensor`, *required*): [batch_size * num_heads, seq_length, head_dim]
258
+ Returns:
259
+ torch.tensor: [batch_size, seq_length, num_heads * head_dim]
260
+ """
261
+ # What we want to achieve is:
262
+ # batch_size * num_heads, seq_length, head_dim -> batch_size, seq_length, num_heads * head_dim
263
+ batch_size_and_num_heads, seq_length, _ = x.shape
264
+ batch_size = batch_size_and_num_heads // self.num_heads
265
+
266
+ # First view to decompose the batch size
267
+ # batch_size * num_heads, seq_length, head_dim -> batch_size, num_heads, seq_length, head_dim
268
+ x = x.view(batch_size, self.num_heads, seq_length, self.head_dim)
269
+
270
+ # batch_size, num_heads, seq_length, head_dim -> batch_size, seq_length, num_heads, head_dim
271
+ x = x.permute(0, 2, 1, 3)
272
+
273
+ # batch_size, seq_length, num_heads, head_dim -> batch_size, seq_length, num_heads * head_dim
274
+ return x.reshape(batch_size, seq_length, self.num_heads * self.head_dim)
275
+
276
+ def forward(
277
+ self,
278
+ hidden_states: torch.Tensor,
279
+ alibi: Optional[torch.Tensor],
280
+ attention_mask: torch.Tensor,
281
+ layer_past: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
282
+ head_mask: Optional[torch.Tensor] = None,
283
+ use_cache: bool = False,
284
+ output_attentions: bool = False,
285
+ ):
286
+ fused_qkv = self.query_key_value(hidden_states) # [batch_size, seq_length, 3 x hidden_size]
287
+ num_kv_heads = self.num_heads if self.new_decoder_architecture else self.num_kv_heads
288
+ # 3 x [batch_size, seq_length, num_heads, head_dim]
289
+ (query_layer, key_layer, value_layer) = self._split_heads(fused_qkv)
290
+
291
+ batch_size, query_length, _, _ = query_layer.shape
292
+
293
+ query_layer = query_layer.transpose(1, 2).reshape(batch_size * self.num_heads, query_length, self.head_dim)
294
+ key_layer = key_layer.transpose(1, 2).reshape(
295
+ batch_size * num_kv_heads,
296
+ query_length,
297
+ self.head_dim,
298
+ )
299
+ value_layer = value_layer.transpose(1, 2).reshape(batch_size * num_kv_heads, query_length, self.head_dim)
300
+
301
+ past_kv_length = 0 if layer_past is None else layer_past[0].shape[1]
302
+ query_layer, key_layer = self.maybe_rotary(query_layer, key_layer, past_kv_length)
303
+
304
+ if layer_past is not None:
305
+ past_key, past_value = layer_past
306
+ # concatenate along seq_length dimension:
307
+ # - key: [batch_size * self.num_heads, kv_length, head_dim]
308
+ # - value: [batch_size * self.num_heads, kv_length, head_dim]
309
+ key_layer = torch.cat((past_key, key_layer), dim=1)
310
+ value_layer = torch.cat((past_value, value_layer), dim=1)
311
+
312
+ _, kv_length, _ = key_layer.shape
313
+ if use_cache:
314
+ present = (key_layer, value_layer)
315
+ else:
316
+ present = None
317
+
318
+ attention_mask_float = (attention_mask * 1.0).masked_fill(attention_mask, float("-1e9")).to(query_layer.dtype)
319
+
320
+ query_layer_ = query_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)
321
+ key_layer_ = key_layer.reshape(batch_size, num_kv_heads, -1, self.head_dim)
322
+ value_layer_ = value_layer.reshape(batch_size, num_kv_heads, -1, self.head_dim)
323
+
324
+ if alibi is None:
325
+ if output_attentions:
326
+ # F.scaled_dot_product_attention doesn't return the attention weights, so we have
327
+ # to do it by hand if we want them
328
+ attention_scores = query_layer_ @ key_layer_.transpose(-1, -2)
329
+ attention_scores /= math.sqrt(self.head_dim)
330
+
331
+ attention_scores = F.softmax(
332
+ attention_scores + attention_mask_float, dim=-1, dtype=hidden_states.dtype
333
+ )
334
+ attn_output = attention_scores @ value_layer_
335
+ else:
336
+ attn_output = F.scaled_dot_product_attention(
337
+ query_layer_, key_layer_, value_layer_, attention_mask_float, 0.0, is_causal=False
338
+ )
339
+ attention_scores = None
340
+
341
+ attn_output = attn_output.view(batch_size, self.num_heads, query_length, self.head_dim)
342
+ attn_output = attn_output.permute(0, 2, 1, 3)
343
+ attn_output = attn_output.reshape(batch_size, query_length, self.num_heads * self.head_dim)
344
+
345
+ output_tensor = self.dense(attn_output)
346
+
347
+ if output_attentions:
348
+ return output_tensor, present, attention_scores
349
+ else:
350
+ return output_tensor, present
351
+
352
+ else:
353
+ matmul_result = query_layer_ @ key_layer_.transpose(-1, -2)
354
+
355
+ # change view to [batch_size, num_heads, q_length, kv_length]
356
+ attention_scores = matmul_result.view(batch_size, self.num_heads, query_length, kv_length)
357
+
358
+ # cast attention scores to fp32, compute scaled softmax and cast back to initial dtype - [batch_size, num_heads, q_length, kv_length]
359
+ input_dtype = attention_scores.dtype
360
+ # `float16` has a minimum value of -65504.0, whereas `bfloat16` and `float32` have a minimum value of `-3.4e+38`
361
+ if input_dtype == torch.float16 or input_dtype == torch.bfloat16:
362
+ attention_scores = attention_scores.to(torch.float32)
363
+ # Matt (HF) note: We could possibly use F.scaled_dot_product_attention here too, by
364
+ # adding (alibi * self.inv_norm_factor) to attention_mask_float. I think this would be mathematically
365
+ # equivalent and more performant, but there might be a numerical difference. If you're reading this
366
+ # and you'd like to experiment and maybe file a PR, feel free!
367
+ attention_logits = attention_scores + alibi.view(batch_size, self.num_heads, 1, -1)
368
+ attention_logits *= self.inv_norm_factor
369
+ attention_probs = F.softmax(attention_logits + attention_mask_float, dim=-1, dtype=hidden_states.dtype)
370
+ # [batch_size, num_heads, q_length, kv_length]
371
+ attention_probs = self.attention_dropout(attention_probs)
372
+
373
+ if head_mask is not None:
374
+ attention_probs = attention_probs * head_mask
375
+
376
+ # change view [batch_size, num_heads, q_length, kv_length]
377
+ attention_probs_reshaped = attention_probs.view(batch_size, self.num_heads, query_length, kv_length)
378
+
379
+ # matmul: [batch_size * num_heads, q_length, head_dim]
380
+ context_layer = (attention_probs_reshaped @ value_layer_).flatten(0, 1)
381
+
382
+ # change view [batch_size, num_heads, q_length, head_dim]
383
+ context_layer = self._merge_heads(context_layer)
384
+
385
+ output_tensor = self.dense(context_layer)
386
+
387
+ if output_attentions:
388
+ return output_tensor, present, attention_probs
389
+ else:
390
+ return output_tensor, present
391
+
392
+
393
+ class FalconMLP(nn.Module):
394
+ def __init__(self, config: FalconConfig):
395
+ super().__init__()
396
+ hidden_size = config.hidden_size
397
+
398
+ self.dense_h_to_4h = FalconLinear(hidden_size, 4 * hidden_size, bias=config.bias)
399
+ self.act = nn.GELU()
400
+ self.dense_4h_to_h = FalconLinear(4 * hidden_size, hidden_size, bias=config.bias)
401
+ self.hidden_dropout = config.hidden_dropout
402
+
403
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
404
+ x = self.act(self.dense_h_to_4h(x))
405
+ x = self.dense_4h_to_h(x)
406
+ return x
407
+
408
+
409
+ class FalconDecoderLayer(nn.Module):
410
+ def __init__(self, config: FalconConfig):
411
+ super().__init__()
412
+ hidden_size = config.hidden_size
413
+ self.num_heads = config.num_attention_heads
414
+ self.self_attention = FalconAttention(config)
415
+ self.mlp = FalconMLP(config)
416
+ self.hidden_dropout = config.hidden_dropout
417
+ self.config = config
418
+
419
+ if config.new_decoder_architecture:
420
+ # The layer norm before self-attention
421
+ self.ln_attn = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
422
+ # The layer norm before the MLP
423
+ self.ln_mlp = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
424
+ else:
425
+ self.input_layernorm = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
426
+ if not config.parallel_attn:
427
+ self.post_attention_layernorm = LayerNorm(hidden_size, eps=config.layer_norm_epsilon)
428
+
429
+ def forward(
430
+ self,
431
+ hidden_states: torch.Tensor,
432
+ alibi: Optional[torch.Tensor],
433
+ attention_mask: torch.Tensor,
434
+ layer_past: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
435
+ head_mask: Optional[torch.Tensor] = None,
436
+ use_cache: bool = False,
437
+ output_attentions: bool = False,
438
+ ):
439
+ residual = hidden_states
440
+
441
+ if self.config.new_decoder_architecture:
442
+ attention_layernorm_out = self.ln_attn(hidden_states)
443
+ mlp_layernorm_out = self.ln_mlp(hidden_states)
444
+ else:
445
+ attention_layernorm_out = self.input_layernorm(hidden_states)
446
+
447
+ # Self attention.
448
+ attn_outputs = self.self_attention(
449
+ attention_layernorm_out,
450
+ layer_past=layer_past,
451
+ attention_mask=attention_mask,
452
+ alibi=alibi,
453
+ head_mask=head_mask,
454
+ use_cache=use_cache,
455
+ output_attentions=output_attentions,
456
+ )
457
+
458
+ attention_output = attn_outputs[0]
459
+
460
+ if not self.config.new_decoder_architecture:
461
+ if self.config.parallel_attn:
462
+ mlp_layernorm_out = attention_layernorm_out
463
+ else:
464
+ residual = dropout_add(
465
+ attention_output, residual, self.config.attention_dropout, training=self.training
466
+ )
467
+ mlp_layernorm_out = self.post_attention_layernorm(residual)
468
+
469
+ outputs = attn_outputs[1:]
470
+
471
+ # MLP.
472
+ mlp_output = self.mlp(mlp_layernorm_out)
473
+
474
+ if self.config.new_decoder_architecture or self.config.parallel_attn:
475
+ mlp_output += attention_output
476
+
477
+ output = dropout_add(mlp_output, residual, self.config.hidden_dropout, training=self.training)
478
+
479
+ if use_cache:
480
+ outputs = (output,) + outputs
481
+ else:
482
+ outputs = (output,) + outputs[1:]
483
+
484
+ return outputs # hidden_states, present, attentions
485
+
486
+
487
+ FALCON_START_DOCSTRING = r"""
488
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
489
+ library implements for all its model (such as downloading or saving, resizing the input embeddings etc.)
490
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
491
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
492
+ and behavior.
493
+ Parameters:
494
+ config ([`FalconConfig`]): Model configuration class with all the parameters of the model.
495
+ Initializing with a config file does not load the weights associated with the model, only the
496
+ configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
497
+ """
498
+
499
+ FALCON_INPUTS_DOCSTRING = r"""
500
+ Args:
501
+ input_ids (`torch.LongTensor` of shape `(batch_size, input_ids_length)`):
502
+ `input_ids_length` = `sequence_length` if `past_key_values` is `None` else `past_key_values[0][0].shape[2]`
503
+ (`sequence_length` of input past key value states). Indices of input sequence tokens in the vocabulary.
504
+ If `past_key_values` is used, only `input_ids` that do not have their past calculated should be passed as
505
+ `input_ids`.
506
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
507
+ [`PreTrainedTokenizer.__call__`] for details.
508
+ [What are input IDs?](../glossary#input-ids)
509
+ past_key_values (`Tuple[Tuple[torch.Tensor]]` of length `config.num_hidden_layers`):
510
+ Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (see
511
+ `past_key_values` output below). Can be used to speed up sequential decoding. The `input_ids` which have
512
+ their past given to this model should not be passed as `input_ids` as they have already been computed.
513
+ Each element of `past_key_values` is a tuple (past_key, past_value):
514
+ - past_key: [batch_size * num_heads, head_dim, kv_length]
515
+ - past_value: [batch_size * num_heads, kv_length, head_dim]
516
+ attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
517
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
518
+ - 1 for tokens that are **not masked**,
519
+ - 0 for tokens that are **masked**.
520
+ [What are attention masks?](../glossary#attention-mask)
521
+ head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
522
+ Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
523
+ - 1 indicates the head is **not masked**,
524
+ - 0 indicates the head is **masked**.
525
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
526
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
527
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
528
+ model's internal embedding lookup matrix.
529
+ If `past_key_values` is used, optionally only the last `inputs_embeds` have to be input (see
530
+ `past_key_values`).
531
+ use_cache (`bool`, *optional*):
532
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
533
+ `past_key_values`).
534
+ output_attentions (`bool`, *optional*):
535
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
536
+ tensors for more detail.
537
+ output_hidden_states (`bool`, *optional*):
538
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
539
+ more detail.
540
+ return_dict (`bool`, *optional*):
541
+ Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
542
+ """
543
+
544
+
545
+ class FalconPreTrainedModel(PreTrainedModel):
546
+ """
547
+ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
548
+ models.
549
+ """
550
+
551
+ config_class = FalconConfig
552
+ base_model_prefix = "transformer"
553
+ supports_gradient_checkpointing = True
554
+ _no_split_modules = ["FalconDecoderLayer"]
555
+
556
+ def __init__(self, *inputs, **kwargs):
557
+ super().__init__(*inputs, **kwargs)
558
+
559
+ def _init_weights(self, module: nn.Module):
560
+ """Initialize the weights."""
561
+ if isinstance(module, nn.Linear) or isinstance(module, FalconLinear):
562
+ # Slightly different from the TF version which uses truncated_normal for initialization
563
+ # cf https://github.com/pytorch/pytorch/pull/5617
564
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
565
+ if module.bias is not None:
566
+ module.bias.data.zero_()
567
+ elif isinstance(module, nn.Embedding):
568
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
569
+ if module.padding_idx is not None:
570
+ module.weight.data[module.padding_idx].zero_()
571
+ elif isinstance(module, LayerNorm):
572
+ module.bias.data.zero_()
573
+ module.weight.data.fill_(1.0)
574
+
575
+ # Copied from transformers.models.bloom.modeling_bloom.BloomPreTrainedModel._set_gradient_checkpointing with BloomModel->FalconModel
576
+ def _set_gradient_checkpointing(self, module: nn.Module, value: bool = False):
577
+ if isinstance(module, FalconModel):
578
+ module.gradient_checkpointing = value
579
+
580
+ @staticmethod
581
+ def _convert_cache_to_standard_format(
582
+ past_key_value: Tuple[Tuple[torch.Tensor, torch.Tensor]], batch_size: int
583
+ ) -> Tuple[Tuple[torch.Tensor, torch.Tensor]]:
584
+ """
585
+ Standardizes the format of the cache so as to match most implementations, i.e. to tuple(tuple([batch_size,
586
+ num_heads, ...]))
587
+ """
588
+ batch_size_times_num_heads, kv_length, head_dim = past_key_value[0][0].shape
589
+ # [batch_size * self.num_heads, kv_length, head_dim] -> [batch_size, num_heads, kv_length, head_dim]
590
+ # Note that don't want to use self.num_attention_heads because the number of heads may vary depending
591
+ # on whether we use multi_query attention.
592
+ num_heads = batch_size_times_num_heads // batch_size
593
+ return tuple(
594
+ (
595
+ layer_past[0].view(batch_size, num_heads, kv_length, head_dim),
596
+ layer_past[1].view(batch_size, num_heads, kv_length, head_dim),
597
+ )
598
+ for layer_past in past_key_value
599
+ )
600
+
601
+ @staticmethod
602
+ def _convert_to_rw_cache(
603
+ past_key_value: Tuple[Tuple[torch.Tensor, torch.Tensor]]
604
+ ) -> Tuple[Tuple[torch.Tensor, torch.Tensor]]:
605
+ batch_size, num_heads, kv_length, head_dim = past_key_value[0][0].shape
606
+ batch_size_times_num_heads = batch_size * num_heads
607
+ # [batch_size, num_heads, kv_length, head_dim] -> [batch_size * num_heads, kv_length, head_dim]
608
+ return tuple(
609
+ (
610
+ layer_past[0].view(batch_size_times_num_heads, kv_length, head_dim),
611
+ layer_past[1].view(batch_size_times_num_heads, kv_length, head_dim),
612
+ )
613
+ for layer_past in past_key_value
614
+ )
615
+
616
+
617
+ @add_start_docstrings(
618
+ "The bare Falcon Model transformer outputting raw hidden-states without any specific head on top.",
619
+ FALCON_START_DOCSTRING,
620
+ )
621
+ class FalconModel(FalconPreTrainedModel):
622
+ def __init__(self, config: FalconConfig):
623
+ super().__init__(config)
624
+
625
+ self.embed_dim = config.hidden_size
626
+ self.num_heads = config.num_attention_heads
627
+ self.use_alibi = config.alibi
628
+
629
+ # Embedding + LN Embedding
630
+ self.word_embeddings = nn.Embedding(config.vocab_size, self.embed_dim)
631
+
632
+ # Transformer blocks
633
+ self.h = nn.ModuleList([FalconDecoderLayer(config) for _ in range(config.num_hidden_layers)])
634
+
635
+ # Final Layer Norm
636
+ self.ln_f = LayerNorm(self.embed_dim, eps=config.layer_norm_epsilon)
637
+
638
+ self.gradient_checkpointing = False
639
+
640
+ # Initialize weights and apply final processing
641
+ self.post_init()
642
+
643
+ def get_input_embeddings(self):
644
+ return self.word_embeddings
645
+
646
+ @staticmethod
647
+ def _prepare_attn_mask(
648
+ attention_mask: torch.Tensor, input_shape: Tuple[int, int], past_key_values_length: int
649
+ ) -> torch.BoolTensor:
650
+ # Create a causal mask
651
+ # The attention mask we receive as input should cover the whole extended sequence, including any past
652
+ # cache, so its shape should be [batch_size, seq_length + past_key_values_length]
653
+ # The output shape will be [batch_size, 1, seq_length, seq_length + past_key_values_length]
654
+ if input_shape[1] + past_key_values_length != attention_mask.shape[1]:
655
+ raise ValueError(
656
+ "Attention mask shape should be (batch_size, seq_length + past_key_values_length)"
657
+ f" but is {attention_mask.shape} with input_ids shape {input_shape} and past length"
658
+ f" {past_key_values_length}."
659
+ )
660
+ combined_attention_mask = None
661
+ device = attention_mask.device
662
+ _, seq_length = input_shape
663
+
664
+ if seq_length > 1:
665
+ combined_attention_mask = _make_causal_mask(
666
+ input_shape, device=device, past_key_values_length=past_key_values_length
667
+ )
668
+
669
+ # [batch_size, seq_length + past_key_values_length] -> [batch_size, 1, seq_length, seq_length + past_key_values_length]
670
+ expanded_attn_mask = _expand_mask(attention_mask, past_key_values_length=past_key_values_length)
671
+ combined_attention_mask = (
672
+ expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask | combined_attention_mask
673
+ )
674
+
675
+ return combined_attention_mask
676
+
677
+ def set_input_embeddings(self, new_embeddings: torch.Tensor):
678
+ self.word_embeddings = new_embeddings
679
+
680
+ @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
681
+ @add_code_sample_docstrings(
682
+ checkpoint=_CHECKPOINT_FOR_DOC,
683
+ output_type=BaseModelOutputWithPastAndCrossAttentions,
684
+ config_class=_CONFIG_FOR_DOC,
685
+ )
686
+ def forward(
687
+ self,
688
+ input_ids: Optional[torch.LongTensor] = None,
689
+ past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
690
+ attention_mask: Optional[torch.Tensor] = None,
691
+ head_mask: Optional[torch.LongTensor] = None,
692
+ inputs_embeds: Optional[torch.LongTensor] = None,
693
+ use_cache: Optional[bool] = None,
694
+ output_attentions: Optional[bool] = None,
695
+ output_hidden_states: Optional[bool] = None,
696
+ return_dict: Optional[bool] = None,
697
+ ) -> Union[Tuple[torch.Tensor, ...], BaseModelOutputWithPastAndCrossAttentions]:
698
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
699
+ output_hidden_states = (
700
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
701
+ )
702
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
703
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
704
+
705
+ if input_ids is not None and inputs_embeds is not None:
706
+ raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
707
+ elif input_ids is not None:
708
+ batch_size, seq_length = input_ids.shape
709
+ elif inputs_embeds is not None:
710
+ batch_size, seq_length, _ = inputs_embeds.shape
711
+ else:
712
+ raise ValueError("You have to specify either input_ids or inputs_embeds")
713
+
714
+ if past_key_values is None:
715
+ past_key_values = tuple([None] * len(self.h))
716
+ else:
717
+ past_key_values = self._convert_to_rw_cache(past_key_values)
718
+
719
+ # Prepare head mask if needed
720
+ # 1.0 in head_mask indicate we keep the head
721
+ # attention_probs has shape batch_size x num_heads x N x N
722
+ # head_mask has shape n_layer x batch x num_heads x N x N
723
+ head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
724
+
725
+ if inputs_embeds is None:
726
+ inputs_embeds = self.word_embeddings(input_ids)
727
+
728
+ hidden_states = inputs_embeds
729
+
730
+ presents = () if use_cache else None
731
+ all_self_attentions = () if output_attentions else None
732
+ all_hidden_states = () if output_hidden_states else None
733
+
734
+ # Compute alibi tensor: check build_alibi_tensor documentation
735
+ past_key_values_length = 0
736
+ if past_key_values[0] is not None:
737
+ past_key_values_length = past_key_values[0][0].shape[1] # 1 because RW-cache, not standard format
738
+ if attention_mask is None:
739
+ attention_mask = torch.ones((batch_size, seq_length + past_key_values_length), device=hidden_states.device)
740
+ else:
741
+ attention_mask = attention_mask.to(hidden_states.device)
742
+
743
+ if self.use_alibi:
744
+ alibi = build_alibi_tensor(attention_mask, self.num_heads, dtype=hidden_states.dtype)
745
+ else:
746
+ alibi = None
747
+
748
+ causal_mask = self._prepare_attn_mask(
749
+ attention_mask,
750
+ input_shape=(batch_size, seq_length),
751
+ past_key_values_length=past_key_values_length,
752
+ )
753
+
754
+ for i, (block, layer_past) in enumerate(zip(self.h, past_key_values)):
755
+ if output_hidden_states:
756
+ all_hidden_states = all_hidden_states + (hidden_states,)
757
+
758
+ if self.gradient_checkpointing and self.training:
759
+ if use_cache:
760
+ logger.warning(
761
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
762
+ )
763
+ use_cache = False
764
+
765
+ def create_custom_forward(module):
766
+ def custom_forward(*inputs):
767
+ # None for past_key_value
768
+ return module(*inputs, use_cache=use_cache, output_attentions=output_attentions)
769
+
770
+ return custom_forward
771
+
772
+ outputs = torch.utils.checkpoint.checkpoint(
773
+ create_custom_forward(block),
774
+ hidden_states,
775
+ alibi,
776
+ causal_mask,
777
+ head_mask[i],
778
+ )
779
+ else:
780
+ outputs = block(
781
+ hidden_states,
782
+ layer_past=layer_past,
783
+ attention_mask=causal_mask,
784
+ head_mask=head_mask[i],
785
+ use_cache=use_cache,
786
+ output_attentions=output_attentions,
787
+ alibi=alibi,
788
+ )
789
+
790
+ hidden_states = outputs[0]
791
+ if use_cache is True:
792
+ presents = presents + (outputs[1],)
793
+
794
+ if output_attentions:
795
+ all_self_attentions = all_self_attentions + (outputs[2 if use_cache else 1],)
796
+
797
+ # Add last hidden state
798
+ hidden_states = self.ln_f(hidden_states)
799
+
800
+ if output_hidden_states:
801
+ all_hidden_states = all_hidden_states + (hidden_states,)
802
+
803
+ if presents is not None:
804
+ presents = self._convert_cache_to_standard_format(presents, batch_size)
805
+
806
+ if not return_dict:
807
+ return tuple(v for v in [hidden_states, presents, all_hidden_states, all_self_attentions] if v is not None)
808
+
809
+ return BaseModelOutputWithPastAndCrossAttentions(
810
+ last_hidden_state=hidden_states,
811
+ past_key_values=presents,
812
+ hidden_states=all_hidden_states,
813
+ attentions=all_self_attentions,
814
+ )
815
+
816
+
817
+ @add_start_docstrings(
818
+ "The Falcon Model transformer with a language modeling head on top (linear layer with weights tied to the input embeddings).",
819
+ FALCON_START_DOCSTRING,
820
+ )
821
+ class FalconForCausalLM(FalconPreTrainedModel):
822
+ _tied_weights_keys = ["lm_head.weight"]
823
+
824
+ def __init__(self, config: FalconConfig):
825
+ super().__init__(config)
826
+ self.transformer = FalconModel(config)
827
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
828
+
829
+ # Initialize weights and apply final processing
830
+ self.post_init()
831
+
832
+ def get_output_embeddings(self):
833
+ return self.lm_head
834
+
835
+ def set_output_embeddings(self, new_embeddings: torch.Tensor):
836
+ self.lm_head = new_embeddings
837
+
838
+ def prepare_inputs_for_generation(
839
+ self,
840
+ input_ids: torch.LongTensor,
841
+ past_key_values: Optional[torch.Tensor] = None,
842
+ attention_mask: Optional[torch.Tensor] = None,
843
+ **kwargs,
844
+ ) -> dict:
845
+ if past_key_values is not None:
846
+ input_ids = input_ids[:, -1:]
847
+
848
+ return {
849
+ "input_ids": input_ids,
850
+ "past_key_values": past_key_values,
851
+ "use_cache": kwargs.get("use_cache"),
852
+ "attention_mask": attention_mask,
853
+ }
854
+
855
+ @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
856
+ @add_code_sample_docstrings(
857
+ checkpoint=_CHECKPOINT_FOR_DOC,
858
+ output_type=CausalLMOutputWithCrossAttentions,
859
+ config_class=_CONFIG_FOR_DOC,
860
+ )
861
+ def forward(
862
+ self,
863
+ input_ids: Optional[torch.LongTensor] = None,
864
+ past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
865
+ attention_mask: Optional[torch.Tensor] = None,
866
+ head_mask: Optional[torch.Tensor] = None,
867
+ inputs_embeds: Optional[torch.Tensor] = None,
868
+ labels: Optional[torch.Tensor] = None,
869
+ use_cache: Optional[bool] = None,
870
+ output_attentions: Optional[bool] = None,
871
+ output_hidden_states: Optional[bool] = None,
872
+ return_dict: Optional[bool] = None,
873
+ ) -> Union[Tuple[torch.Tensor], CausalLMOutputWithCrossAttentions]:
874
+ r"""
875
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
876
+ Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
877
+ `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
878
+ are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
879
+ """
880
+
881
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
882
+
883
+ transformer_outputs = self.transformer(
884
+ input_ids,
885
+ past_key_values=past_key_values,
886
+ attention_mask=attention_mask,
887
+ head_mask=head_mask,
888
+ inputs_embeds=inputs_embeds,
889
+ use_cache=use_cache,
890
+ output_attentions=output_attentions,
891
+ output_hidden_states=output_hidden_states,
892
+ return_dict=return_dict,
893
+ )
894
+ hidden_states = transformer_outputs[0]
895
+
896
+ lm_logits = self.lm_head(hidden_states)
897
+
898
+ loss = None
899
+ if labels is not None:
900
+ # Shift so that tokens < n predict n
901
+ shift_logits = lm_logits[..., :-1, :].contiguous()
902
+ shift_labels = labels[..., 1:].contiguous()
903
+ batch_size, seq_length, vocab_size = shift_logits.shape
904
+ # Flatten the tokens
905
+ loss_fct = CrossEntropyLoss()
906
+ loss = loss_fct(
907
+ shift_logits.view(batch_size * seq_length, vocab_size), shift_labels.view(batch_size * seq_length)
908
+ )
909
+
910
+ if not return_dict:
911
+ output = (lm_logits,) + transformer_outputs[1:]
912
+ return ((loss,) + output) if loss is not None else output
913
+
914
+ return CausalLMOutputWithCrossAttentions(
915
+ loss=loss,
916
+ logits=lm_logits,
917
+ past_key_values=transformer_outputs.past_key_values,
918
+ hidden_states=transformer_outputs.hidden_states,
919
+ attentions=transformer_outputs.attentions,
920
+ )
921
+
922
+ def _reorder_cache(
923
+ self, past: Tuple[Tuple[torch.Tensor, torch.Tensor], ...], beam_idx: torch.LongTensor
924
+ ) -> Tuple[Tuple[torch.Tensor, torch.Tensor], ...]:
925
+ """
926
+ This function is used to re-order the `past_key_values` cache if [`~PreTrainedModel.beam_search`] or
927
+ [`~PreTrainedModel.beam_sample`] is called. This is required to match `past_key_values` with the correct
928
+ beam_idx at every generation step.
929
+ Output shares the same memory storage as `past`.
930
+ """
931
+
932
+ # Get a copy of `beam_idx` on all the devices where we need those indices.
933
+ device_to_beam_idx = {
934
+ past_state.device: beam_idx.to(past_state.device) for layer_past in past for past_state in layer_past
935
+ }
936
+ reordered_past = tuple(
937
+ (
938
+ layer_past[0].index_select(0, device_to_beam_idx[layer_past[0].device]),
939
+ layer_past[1].index_select(0, device_to_beam_idx[layer_past[0].device]),
940
+ )
941
+ for layer_past in past
942
+ )
943
+ return reordered_past
944
+
945
+
946
+ @add_start_docstrings(
947
+ """
948
+ The Falcon Model transformer with a sequence classification head on top (linear layer).
949
+ [`FalconForSequenceClassification`] uses the last token in order to do the classification, as other causal models
950
+ (e.g. GPT-1) do.
951
+ Since it does classification on the last token, it requires to know the position of the last token. If a
952
+ `pad_token_id` is defined in the configuration, it finds the last token that is not a padding token in each row. If
953
+ no `pad_token_id` is defined, it simply takes the last value in each row of the batch. Since it cannot guess the
954
+ padding tokens when `inputs_embeds` are passed instead of `input_ids`, it does the same (take the last value in
955
+ each row of the batch).
956
+ """,
957
+ FALCON_START_DOCSTRING,
958
+ )
959
+ class FalconForSequenceClassification(FalconPreTrainedModel):
960
+ def __init__(self, config: FalconConfig):
961
+ super().__init__(config)
962
+ self.num_labels = config.num_labels
963
+ self.transformer = FalconModel(config)
964
+ self.score = nn.Linear(config.hidden_size, config.num_labels, bias=False)
965
+
966
+ # Initialize weights and apply final processing
967
+ self.post_init()
968
+
969
+ @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
970
+ @add_code_sample_docstrings(
971
+ checkpoint=_CHECKPOINT_FOR_DOC,
972
+ output_type=SequenceClassifierOutputWithPast,
973
+ config_class=_CONFIG_FOR_DOC,
974
+ )
975
+ def forward(
976
+ self,
977
+ input_ids: Optional[torch.LongTensor] = None,
978
+ past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
979
+ attention_mask: Optional[torch.Tensor] = None,
980
+ head_mask: Optional[torch.Tensor] = None,
981
+ inputs_embeds: Optional[torch.Tensor] = None,
982
+ labels: Optional[torch.Tensor] = None,
983
+ use_cache: Optional[bool] = None,
984
+ output_attentions: Optional[bool] = None,
985
+ output_hidden_states: Optional[bool] = None,
986
+ return_dict: Optional[bool] = None,
987
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutputWithPast]:
988
+ r"""
989
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
990
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
991
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
992
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
993
+ """
994
+
995
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
996
+
997
+ transformer_outputs = self.transformer(
998
+ input_ids,
999
+ past_key_values=past_key_values,
1000
+ attention_mask=attention_mask,
1001
+ head_mask=head_mask,
1002
+ inputs_embeds=inputs_embeds,
1003
+ use_cache=use_cache,
1004
+ output_attentions=output_attentions,
1005
+ output_hidden_states=output_hidden_states,
1006
+ return_dict=return_dict,
1007
+ )
1008
+
1009
+ hidden_states = transformer_outputs[0]
1010
+ logits = self.score(hidden_states)
1011
+
1012
+ if input_ids is not None:
1013
+ batch_size = input_ids.shape[0]
1014
+ else:
1015
+ batch_size = inputs_embeds.shape[0]
1016
+
1017
+ if self.config.pad_token_id is None and batch_size != 1:
1018
+ raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
1019
+ if self.config.pad_token_id is None:
1020
+ sequence_lengths = -1
1021
+ else:
1022
+ if input_ids is not None:
1023
+ sequence_lengths = torch.ne(input_ids, self.config.pad_token_id).sum(dim=-1) - 1
1024
+ else:
1025
+ sequence_lengths = -1
1026
+ logger.warning(
1027
+ f"{self.__class__.__name__} will not detect padding tokens in `inputs_embeds`. Results may be "
1028
+ "unexpected if using padding tokens in conjunction with `inputs_embeds.`"
1029
+ )
1030
+
1031
+ pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
1032
+
1033
+ loss = None
1034
+ if labels is not None:
1035
+ if self.config.problem_type is None:
1036
+ if self.num_labels == 1:
1037
+ self.config.problem_type = "regression"
1038
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1039
+ self.config.problem_type = "single_label_classification"
1040
+ else:
1041
+ self.config.problem_type = "multi_label_classification"
1042
+
1043
+ if self.config.problem_type == "regression":
1044
+ loss_fct = MSELoss()
1045
+ if self.num_labels == 1:
1046
+ loss = loss_fct(pooled_logits.squeeze(), labels.squeeze())
1047
+ else:
1048
+ loss = loss_fct(pooled_logits, labels)
1049
+ elif self.config.problem_type == "single_label_classification":
1050
+ loss_fct = CrossEntropyLoss()
1051
+ loss = loss_fct(pooled_logits, labels)
1052
+ elif self.config.problem_type == "multi_label_classification":
1053
+ loss_fct = BCEWithLogitsLoss()
1054
+ loss = loss_fct(pooled_logits, labels)
1055
+ if not return_dict:
1056
+ output = (pooled_logits,) + transformer_outputs[1:]
1057
+ return ((loss,) + output) if loss is not None else output
1058
+
1059
+ return SequenceClassifierOutputWithPast(
1060
+ loss=loss,
1061
+ logits=pooled_logits,
1062
+ past_key_values=transformer_outputs.past_key_values,
1063
+ hidden_states=transformer_outputs.hidden_states,
1064
+ attentions=transformer_outputs.attentions,
1065
+ )
1066
+
1067
+
1068
+ @add_start_docstrings(
1069
+ """
1070
+ Falcon Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for
1071
+ Named-Entity-Recognition (NER) tasks.
1072
+ """,
1073
+ FALCON_START_DOCSTRING,
1074
+ )
1075
+ class FalconForTokenClassification(FalconPreTrainedModel):
1076
+ def __init__(self, config: FalconConfig):
1077
+ super().__init__(config)
1078
+ self.num_labels = config.num_labels
1079
+
1080
+ self.transformer = FalconModel(config)
1081
+ if getattr(config, "classifier_dropout", None) is not None:
1082
+ classifier_dropout = config.classifier_dropout
1083
+ elif getattr(config, "hidden_dropout", None) is not None:
1084
+ classifier_dropout = config.hidden_dropout
1085
+ else:
1086
+ classifier_dropout = 0.1
1087
+ self.dropout = nn.Dropout(classifier_dropout)
1088
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1089
+
1090
+ # Initialize weights and apply final processing
1091
+ self.post_init()
1092
+
1093
+ @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
1094
+ @add_code_sample_docstrings(
1095
+ checkpoint=_CHECKPOINT_FOR_DOC,
1096
+ output_type=TokenClassifierOutput,
1097
+ config_class=_CONFIG_FOR_DOC,
1098
+ )
1099
+ def forward(
1100
+ self,
1101
+ input_ids: Optional[torch.LongTensor] = None,
1102
+ past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
1103
+ attention_mask: Optional[torch.Tensor] = None,
1104
+ head_mask: Optional[torch.Tensor] = None,
1105
+ inputs_embeds: Optional[torch.Tensor] = None,
1106
+ labels: Optional[torch.Tensor] = None,
1107
+ use_cache: Optional[bool] = None,
1108
+ output_attentions: Optional[bool] = None,
1109
+ output_hidden_states: Optional[bool] = None,
1110
+ return_dict: Optional[bool] = None,
1111
+ ) -> Union[Tuple[torch.Tensor], TokenClassifierOutput]:
1112
+ r"""
1113
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1114
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1115
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1116
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1117
+ """
1118
+
1119
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1120
+
1121
+ transformer_outputs = self.transformer(
1122
+ input_ids,
1123
+ past_key_values=past_key_values,
1124
+ attention_mask=attention_mask,
1125
+ head_mask=head_mask,
1126
+ inputs_embeds=inputs_embeds,
1127
+ use_cache=use_cache,
1128
+ output_attentions=output_attentions,
1129
+ output_hidden_states=output_hidden_states,
1130
+ return_dict=return_dict,
1131
+ )
1132
+
1133
+ hidden_states = transformer_outputs[0]
1134
+ hidden_states = self.dropout(hidden_states)
1135
+ logits = self.classifier(hidden_states)
1136
+
1137
+ loss = None
1138
+ if labels is not None:
1139
+ batch_size, seq_length = labels.shape
1140
+ loss_fct = CrossEntropyLoss()
1141
+ loss = loss_fct(
1142
+ logits.view(batch_size * seq_length, self.num_labels), labels.view(batch_size * seq_length)
1143
+ )
1144
+
1145
+ if not return_dict:
1146
+ output = (logits,) + transformer_outputs[2:]
1147
+ return ((loss,) + output) if loss is not None else output
1148
+
1149
+ return TokenClassifierOutput(
1150
+ loss=loss,
1151
+ logits=logits,
1152
+ hidden_states=transformer_outputs.hidden_states,
1153
+ attentions=transformer_outputs.attentions,
1154
+ )
1155
+
1156
+
1157
+ @add_start_docstrings(
1158
+ """
1159
+ The Falcon Model transformer with a span classification head on top for extractive question-answering tasks like
1160
+ SQuAD (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
1161
+ """,
1162
+ FALCON_START_DOCSTRING,
1163
+ )
1164
+ class FalconForQuestionAnswering(FalconPreTrainedModel):
1165
+ def __init__(self, config):
1166
+ super().__init__(config)
1167
+ self.transformer = FalconModel(config)
1168
+ self.qa_outputs = nn.Linear(config.hidden_size, 2)
1169
+
1170
+ # Initialize weights and apply final processing
1171
+ self.post_init()
1172
+
1173
+ @add_start_docstrings_to_model_forward(FALCON_INPUTS_DOCSTRING)
1174
+ def forward(
1175
+ self,
1176
+ input_ids: Optional[torch.LongTensor] = None,
1177
+ attention_mask: Optional[torch.FloatTensor] = None,
1178
+ head_mask: Optional[torch.FloatTensor] = None,
1179
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1180
+ start_positions: Optional[torch.LongTensor] = None,
1181
+ end_positions: Optional[torch.LongTensor] = None,
1182
+ output_attentions: Optional[bool] = None,
1183
+ output_hidden_states: Optional[bool] = None,
1184
+ return_dict: Optional[bool] = None,
1185
+ ) -> Union[Tuple, QuestionAnsweringModelOutput]:
1186
+ r"""
1187
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1188
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
1189
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1190
+ are not taken into account for computing the loss.
1191
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1192
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
1193
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1194
+ are not taken into account for computing the loss.
1195
+ """
1196
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1197
+
1198
+ outputs = self.transformer(
1199
+ input_ids,
1200
+ attention_mask=attention_mask,
1201
+ head_mask=head_mask,
1202
+ inputs_embeds=inputs_embeds,
1203
+ output_attentions=output_attentions,
1204
+ output_hidden_states=output_hidden_states,
1205
+ return_dict=return_dict,
1206
+ )
1207
+
1208
+ sequence_output = outputs[0]
1209
+
1210
+ logits = self.qa_outputs(sequence_output)
1211
+ start_logits, end_logits = logits.split(1, dim=-1)
1212
+ start_logits = start_logits.squeeze(-1).contiguous()
1213
+ end_logits = end_logits.squeeze(-1).contiguous()
1214
+
1215
+ total_loss = None
1216
+ if start_positions is not None and end_positions is not None:
1217
+ # If we are on multi-GPU, split add a dimension
1218
+ if len(start_positions.size()) > 1:
1219
+ start_positions = start_positions.squeeze(-1)
1220
+ if len(end_positions.size()) > 1:
1221
+ end_positions = end_positions.squeeze(-1)
1222
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
1223
+ ignored_index = start_logits.size(1)
1224
+ start_positions = start_positions.clamp(0, ignored_index)
1225
+ end_positions = end_positions.clamp(0, ignored_index)
1226
+
1227
+ loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
1228
+ start_loss = loss_fct(start_logits, start_positions)
1229
+ end_loss = loss_fct(end_logits, end_positions)
1230
+ total_loss = (start_loss + end_loss) / 2
1231
+
1232
+ if not return_dict:
1233
+ output = (start_logits, end_logits) + outputs[2:]
1234
+ return ((total_loss,) + output) if total_loss is not None else output
1235
+
1236
+ return QuestionAnsweringModelOutput(
1237
+ loss=total_loss,
1238
+ start_logits=start_logits,
1239
+ end_logits=end_logits,
1240
+ hidden_states=outputs.hidden_states,
1241
+ attentions=outputs.attentions,
1242
+ )
special_tokens_map.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>PREFIX<<",
12
+ ">>SUFFIX<<",
13
+ ">>MIDDLE<<"
14
+ ],
15
+ "eos_token": "<|endoftext|>"
16
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "eos_token": "<|endoftext|>",
4
+ "model_input_names": [
5
+ "input_ids",
6
+ "attention_mask"
7
+ ],
8
+ "model_max_length": 2048,
9
+ "name_or_path": "tiiuae/falcon_tokenizer",
10
+ "special_tokens_map_file": null,
11
+ "tokenizer_class": "PreTrainedTokenizerFast"
12
+ }