# Tutorial: Clicking Buttons to Load More Content with Crawl4AI ## Introduction When scraping dynamic websites, it’s common to encounter “Load More” or “Next” buttons that must be clicked to reveal new content. Crawl4AI provides a straightforward way to handle these situations using JavaScript execution and waiting conditions. In this tutorial, we’ll cover two approaches: 1. **Step-by-step (Session-based) Approach:** Multiple calls to `arun()` to progressively load more content. 2. **Single-call Approach:** Execute a more complex JavaScript snippet inside a single `arun()` call to handle all clicks at once before the extraction. ## Prerequisites - A working installation of Crawl4AI - Basic familiarity with Python’s `async`/`await` syntax ## Step-by-Step Approach Use a session ID to maintain state across multiple `arun()` calls: ```python from crawl4ai import AsyncWebCrawler, CacheMode js_code = [ # This JS finds the “Next” button and clicks it "const nextButton = document.querySelector('button.next'); nextButton && nextButton.click();" ] wait_for_condition = "css:.new-content-class" async with AsyncWebCrawler(headless=True, verbose=True) as crawler: # 1. Load the initial page result_initial = await crawler.arun( url="https://example.com", cache_mode=CacheMode.BYPASS, session_id="my_session" ) # 2. Click the 'Next' button and wait for new content result_next = await crawler.arun( url="https://example.com", session_id="my_session", js_code=js_code, wait_for=wait_for_condition, js_only=True, cache_mode=CacheMode.BYPASS ) # `result_next` now contains the updated HTML after clicking 'Next' ``` **Key Points:** - **`session_id`**: Keeps the same browser context open. - **`js_code`**: Executes JavaScript in the context of the already loaded page. - **`wait_for`**: Ensures the crawler waits until new content is fully loaded. - **`js_only=True`**: Runs the JS in the current session without reloading the page. By repeating the `arun()` call multiple times and modifying the `js_code` (e.g., clicking different modules or pages), you can iteratively load all the desired content. ## Single-call Approach If the page allows it, you can run a single `arun()` call with a more elaborate JavaScript snippet that: - Iterates over all the modules or "Next" buttons - Clicks them one by one - Waits for content updates between each click - Once done, returns control to Crawl4AI for extraction. Example snippet: ```python from crawl4ai import AsyncWebCrawler, CacheMode js_code = [ # Example JS that clicks multiple modules: """ (async () => { const modules = document.querySelectorAll('.module-item'); for (let i = 0; i < modules.length; i++) { modules[i].scrollIntoView(); modules[i].click(); // Wait for each module’s content to load, adjust 100ms as needed await new Promise(r => setTimeout(r, 100)); } })(); """ ] async with AsyncWebCrawler(headless=True, verbose=True) as crawler: result = await crawler.arun( url="https://example.com", js_code=js_code, wait_for="css:.final-loaded-content-class", cache_mode=CacheMode.BYPASS ) # `result` now contains all content after all modules have been clicked in one go. ``` **Key Points:** - All interactions (clicks and waits) happen before the extraction. - Ideal for pages where all steps can be done in a single pass. ## Choosing the Right Approach - **Step-by-Step (Session-based)**: - Good when you need fine-grained control or must dynamically check conditions before clicking the next page. - Useful if the page requires multiple conditions checked at runtime. - **Single-call**: - Perfect if the sequence of interactions is known in advance. - Cleaner code if the page’s structure is consistent and predictable. ## Conclusion Crawl4AI makes it easy to handle dynamic content: - Use session IDs and multiple `arun()` calls for stepwise crawling. - Or pack all actions into one `arun()` call if the interactions are well-defined upfront. This flexibility ensures you can handle a wide range of dynamic web pages efficiently.