Selenium, Tor and New Years Resolutions

Published 22nd December 2019 at 06:37am (Last Updated 14th April 2020 at 06:36pm)

Every year I tell myself that I'll get serious about blogging. Every year, I start a post - sometimes two - crappy little drafts that I never get round to finishing. Something's always in the way. I think I'll aim for something shorter from now on. Shorter posts, more frequent.

I could talk about some of the awful (and in one case dangerous) people I had to deal with at my last job. That job isn't listed in my "about" page at the moment because I haven't had time to update the site.

Since I left RPS, I've been

a little

preoccupied.

Perhaps I'll discuss it all another time. Ideally at a more social hour (it's 3:09am).


 

Selenium


Did you know that you could write selenium commands with Tor? Found this really cool library earlier this week that lets you do it. The repo contains a few examples so definitely worth having a read through.

To try it out I'd suggest the following as prerequisites:


  • Python 3 (I mean, you could probably use a different language for this if you really wanted to but most of the examples I found online were already in Python - which I'm not even a fan of - *shrugs*)
  • Jupyter Notebook (just saves you having to run python -s fileName.py each time so not a requirement per se...it also doesn't even render in Github properly anymore...maybe you're better off leaving it out, idk)
  • Selenium (pip install selenium)
  • TB Selenium (pip install tbselenium)
  • The stem package (pip install stem)
  • Tor (duh!)


In your script, first add your imports:

import tbselenium.common as cm
from tbselenium.tbdriver import TorBrowserDriver
from tbselenium.utils import launch_tbb_tor_with_stem
from selenium.webdriver.common.keys import Keys
import time

Most of them are from the WebFP tor-browser-selenium library itself but the Key class is imported from the original Selenium WebDriver and of course time is a built-in Python module.

One you've installed Tor, assign the file location to the tbb_dir variable in your script...

tbb_dir = "/absolute/path/to/tor-browser_en-US/"

...this directory will be referred to in the following line. Here you will launch a new Tor process with the stem package:

tor_process = launch_tbb_tor_with_stem(tbb_path=tbb_dir)

The code that you want to run will need to take place within the following block:

with TorBrowserDriver(tbb_dir, tor_cfg=cm.USE_STEM) as driver:
    # do selenium stuff

Fun fact about Selenium: it's pronounced "se-leh-nium". I know this because one of its inventors told me this...or perhaps it was a group of us that he told. It was probably a whole tech talk. It was a long time ago. My point is, the English language is dumb.

Now load the page you want to visit. Where should we go?

How about the Hidden WIki?:

with TorBrowserDriver(tbb_dir, tor_cfg=cm.USE_STEM) as driver:
    driver.load_url("http://zqktlwi4i34kbat3.onion/wiki/index.php/Main_Page")

Note, make sure your browser's set to "Safest" in the security settings. Onion sites take forever and a day to load with the "Standard" settings for me. No idea why. Should probably care more but it's late early and I don't.

To fix this, open an instance of the Tor browser locally and change the settings manually like so:

Tor Security Settings

Then close the browser and in a terminal window, run:

sudo pkill tor

Running that command before running your finished script should also resolve the error described in this Github issue (OSError: Process terminated: Failed to bind one of the listener ports.) if you run into it.

Next, add any of the actions that you would like Selenium to perform into the block. It's late (or early...it's 4:45am now) so lets keep this to a simple search. We can look for the term "pgp" as I still know next to nothing about those:

el = driver.find_element_by_name("search")

In the code above, we're looking for an element on the web page with the name "search". This is actually not the best way to do it...in fact, as the field itself looks like this...


Hidden Wiki search input source code


...you'd have probably been better off with something like...

el = driver.find_element_by_xpath("//input[@id='searchInput']")

Either one works though so whatever. The commands after that essentially clear the field as a precaution, input text in the search field (el.send_keys("pgp")) and hit the return key (el.send_keys(Keys.RETURN)).

Once you're done, always make sure you remember to kill the tor process:

tor_process.kill()

So your code should look like this:

import tbselenium.common as cm
from tbselenium.tbdriver import TorBrowserDriver
from tbselenium.utils import launch_tbb_tor_with_stem
from selenium.webdriver.common.keys import Keys
import time

tbb_dir = "/absolute/path/to/tor-browser_en-US/"
tor_process = launch_tbb_tor_with_stem(tbb_path=tbb_dir)
with TorBrowserDriver(tbb_dir, tor_cfg=cm.USE_STEM) as driver:
    driver.load_url("http://zqktlwi4i34kbat3.onion/wiki/index.php/Main_Page")
    el = driver.find_element_by_name("search")
    time.sleep(1)
    el.clear()
    el.send_keys("pgp")
    el.send_keys(Keys.RETURN)
    time.sleep(5)

tor_process.kill()

There's other things you can do to improve the script. Running a headless Tor browser's good for when you want it to run in the background. To run that, first install xvfb. On Manjaro, I had to run:

sudo pacman -S community/python-xvfbwrapper
sudo pacman -S extra/xorg-xdpyinfo

Don't know what you'd have to do on anything else sadly.

Then import the xvfb helpers from the tbselenium package:

from tbselenium.utils import start_xvfb, stop_xvfb, launch_tbb_tor_with_stem

And wrap your code in a block like the following:

xvfb_display = start_xvfb()
with TorBrowserDriver(tbb_dir, tor_cfg=cm.USE_STEM) as driver:
    # do selenium stuff
stop_xvfb(xvfb_display)
tor_process.kill()

You can also use the WebDriverWait class to wait for the page to load a specific element before running a command. In a different script, I thought it made sense to create a helper method like this one:

def wait_for_it(driver, xpath):
    WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.XPATH, "{}".format(xpath)))
    )

    return driver.find_element_by_xpath("{}".format(xpath))

Here driver is the driver from before and xpath would be a string like "//input[@id='searchInput']". This saves us using time.sleep() all the time which is a little tedious is the page takes 10 seconds to load some of the time and 3 seconds most of the time, &c.


 

Other news:


  • I've started using Manjaro which, from what I've heard, is like Baby Arch. I might post if I stumble upon something interesting but idk.

Baby Linux

  • To all of my fellow countrymen who voted the wrong way on the 12th and those who helped spread vicious lies about a good man prior to that: when the UK slips into third-world levels of poverty because the man that you elected sold off or drove out everything and everyone that made this country worth a damn, I hope that you can live with knowing that your consent made it all possible.

 

Ok, signing off for the night morning as it's 6:37 and I was meant to go to sleep over four hours ago.

Post a comment

Posted by Anonymous on 8th August 2021 at 17:50:32

HELLO

Posted by Jess on 8th August 2021 at 18:14:56

...Hi?

Posted by Anonymous on 8th August 2021 at 18:20:47

I can't create new circuits :( can you help me, I tried to sendkeys to do that Ctrl+shift+L but it doesnt work.

Posted by Jess on 8th August 2021 at 20:28:54

I honestly don't know. What you're asking about is out of scope for the blog post and from what I've read so far, it might not even be possible to do with Selenium. Are you sure you couldn't just try creating a new identity for the work you're doing instead or does it absolutely have to be a new circuit?


If you haven't done so already, I would suggest taking a look at the ActionChains class in Selenium when attempting to press two keys at once because it provides the #key_down and #key_up methods whereas #send_keys will usually perform each keystroke one after the other. You might still have trouble using those in a remote controlled Tor browser however as I personally struggled with that a little earlier. It should work with other browsers though.


Sorry that I couldn't be more helpful - I haven't really had to use this library in a while and have never really needed to do anything like that before.

Posted by Anonymous on 9th August 2021 at 13:12:33

thanks for the tips.I didn't think you would answer me but u did thanks.I've tried action chains but doesnt work.I will make loop that closes and opens new browser.Thanks again.u good 👍, I like ur blog.☺️