The thing to understand is that there can be multiple reasons it's not working consistently. You might actually be having a combination of four separate issues which all manifest as the same issue so it's hard to tell what the cause is. I'll give examples.
*You (not actually you but anybody) don't remember the exact commands or understand that not all commands work in every situation. The phrase Xbox and Xbox Watch TV (for example) are two separate phrases. If you only say Xbox then you will generally always have to say Select because you didn't give it a specific command the first time. You can't say Xbox then wait then say Watch TV because by saying Xbox only you have already moved yourself into a different voice menu. "Why can't it just understand the context I'm thinking of every time!" Well because it's not artificial intelligence and it (like any voice system) needs rules to differentiate between you saying Xbox in a conversation and it actually receiving a command to do something. In your example of "Just say Xbox Turn off", well that does work. When I say it to mine it turns off my Xbox One, my 360, my TV, and my soundbar. All four respond to that, because I have my stuff set up the way MS intended it to be. All in one, through the Xbone. But not everyone has their stuff set up "right". To add to this confusion Google decided to be a jerk and their YouTube app doesn't follow any of the standard Xbox voice commands. To do voice commands in the YouTube app you have to actually say "YouTube". Google's fault, not MS, etc.
*It's not calibrated properly. If it's not calibrated properly then it will definitely have a hard time hearing you, even things like room acoustics can mess it up. That's why you have to turn it up loud when you calibrate it, it needs to get a feel for the shape of the room, how sound travels, etc.
*There is too much ambient noise. If your TV sound isn't being sent through the Xbox then guess what, all that sound from the movie you're watching or the game you're playing is directly conflicting with the Xbox's ability to listen to you. If all sound is being channeled through the Xbox then it actually knows to cancel that sound out. This may not be your issue, but I know it is an issue for some. For example, that commercial that just aired where people's Xbox's are responding to the voice commands in the commercial. That wouldn't happen if the sound was coming through the Xbox itself. That's the way it was designed. If you set yours up in another way then you make it hard for it to determine if it's a legit command or not. Also yes, if there are other people in the room talking or the Kinect is near something else making a lot of noise, it's gonna get in the way. Same as phones and tablets.
*It's still voice command, not perfect on anyone's hardware yet. It's just how tech is right now. No voice system understand context perfectly every time. Until we have real actual artificial intelligence this is just not going to happen. We all have this issue, and it's just a by-product of the time we live in. That's why we have to learn the commands and the proper way to use them. It's not an MS issue, though they are adding more contextual understanding/natural language in the future. However, if people can't bother to learn how to use the tech they're given, then it's on them. We all know that person that can't ever figure out how to do something on their smartphone. It's because they refused to spend the time to learn it, read the manual, etc. That doesn't mean their smartphone is bad, they are just lazy. We live in a high tech world, people are absolutely expected to learn their stuff.