pypdf2 docx2txt transformers torch streamlit pandas