There has long been dissent about the best wording for interacting with different interface elements, such as links and buttons. Will using Click make enough sense when used on a touch screen or using voice command? Is it too ancient to use for youngsters on mobile devices who have never used a mouse? Is Tap too specific to touchscreens and too foreign for veterans of the PC era?
Part of the problem is that there are websites and apps that support a mixture of mice, keyboard, and direct input with the screen with fingers and stylus, and more. Often times there is no way to detect what device is being used, and whether more than one input mechanism is available (think of an iPad with a bluetooth keyboard, or assisting technologies such as eye-gaze detection and keyboard accessibility). When touch, mouse, and other mysterious input methods are allowed, what is the most helpful and accurate way of describing this?
Click/tap combo is impersonal and indefinite.
You could certainly opt to include both the words Click and Tap, but that might end up confusing people into trying multiple things which may not be supported. By offering multiple choices and making the user slow down and think about what to do, you might reduce the rate of completion. There is more cognitive overhead needed to process which one to act on, and begs the question why not include all options:
Click, tap, set with your keyboard, command with your voice, execute with your Wii remote, or direct your personal assistant to pursue the Submit button.
It also comes off as you not taking the time to get to know your users. As part of a brand, its voice would be indecisive. Using Click by itself isn’t so bad, but still comes off as outdated.
Select can have no action.
Select seems pretty smart , but it has potential to be misconstrued as highlighting content, such as selecting text by dragging a range (for copy and paste). Select can also mean to make the object in focus, which makes things ready for action, but doesn’t actually carry anything out. It’s like being able to select a place to fish, but not casting your line.
If the context is clear enough, you can probably get away with Select.
Press can be permanent and misrepresent.
Press has the advantage of being colloquially appropriate for both touch and traditional computing, but still fails to reflect a neutral interaction method. Imagine someone has hacked a NES zapper gun so it works on a web browser by aiming and Pulling a trigger… We can’t rely on the fact that all user actions will be “pushing” against a link.
Press does address both tap and click, but still leaves ambiguity about the keyboard and other input types. If I say “Press Delete,” I have no idea if that is a text link in the UI, or if that is referencing the Delete key on the keyboard. If I am blind and use voice input to make my choice, I am not pressing anything. Just realize that Press falsely describes other modes of access, like shouting into a microphone, aiming a remote, or throwing at a target. This might be a smaller issue because those using accessible technology are in the minority and may be familiar with the metaphor.
Aditionally, Press fails to clarify whether or not to hold the press since there is no “unpress” mentioned, and people may never let go! The same issue occurs with Touch being the lengthier version of Tap. Touch doesn’t imply release, whereas tap does.
If you want to optimize for primary interactions and the duration of the “press” isn’t an issue, you can stick with this one.
Activate doesn’t explain how to.
While Activate is an honest way of universally describing the act of engaging a link, it is also a bit too technical and robotic. While it makes sense to use for an engineers spec or for clarity in code, I just can’t imagine people using it in a natural conversation.
It also has ambiguity with checkboxes. Activating something suggests that it has opposing states can be deactivated. It also doesn’t instruct how something becomes activated. Imagine a help manual that explains, “To import an image, activate the Upload button.” Does that mean I have to find somewhere in the settings to switch the Upload button from disabled to enabled? Do I have to right-click and choose Activate in the context menu? Will I need to edit the code because it’s deactivated? Maybe I have to mix chemical compounds to awaken neutralized solutions?
Use Activate if your participants are technically inclined, and you are performing your best C3PO impression.
Other novel options
If you don’t mind being liable if people break their screens, you could ask them to wear boxing gloves to Hit the pressure-sensitive Submit button. I don’t think most people would get carried away with that one. But if you have angry customers or they run into errors, you never know. If you are a magician or ghostbuster you could Summon the submit button. Officers in prison may suggest that you Execute. Choose is also good for describing a decision, especially when there are multiple options to decide among.
To loop the peanut-butter animation, choose Jif or Gif.
Describe the end result
You also have the option to skip the issue altogether. Take a closer look at labels, surrounding text, and visual treatment to increase the self-evidence of an interaction. You might be able to abstain from referring to the input type by stating the activity that will take place and formatting it properly:
“You are seconds away from a free iPad. Verify your account.”
If the context is not clear enough, you could try “Follow the link below to [do action].” Other similar action phrases include go to and visit.
The best choice
If you know there is a single allowed way to interact, stick to what is recognizable and familiar to the medium and its audience (i.e., pressing the physical Power button on your phone). Realize how broadly you can encompass all modes of interaction, while also being specific enough to delineate between what is allowed. Clarity is king here. If it is dead obvious, you can probably get away without referencing the physical action, and instead describe the outcome. It’s better to be context-oriented considering the relevant mental model. It’s easy to get caught with little things or be too high in the clouds when it comes to giving comprehensible directions.
I leave you with a breakfast instruction parable for not unlike “The Three Bears,” given to someone who may want to eat and is unfamiliar with the situation:
Too programmatic
Obtain 4-inch spoon. Balance spoon to capture cereal and milk from bowl. While perpendicular to the x-axis, catapult spoon while easing acceleration vertically toward open mouth belonging to you. Once contents of spoons is 100% inside mouth, close mouth to secure nourishment. Eject spoon from mouth and […]
This spells things out accurately, but is not quickly intelligible.
Too high-level
Reach for edible ingredients.
This statement is true, but may lead novices astray. Assume users will make mistake when things are on-target but ambiguous.
Just right
You can find raisin bran or bagels in the pantry next to the refrigerator. Utensils are in the top drawer to the right of the sink.
This has the right amount of information to be helpful without being distracting.