
While some caveats about implementation might be appropriate, I think if you're trying to compare the merits of algorithms as algorithms, the meat of it is in the optimal case. Once you know the theoretical boundaries of performance, there's another set of problems in trying to get your implementation to approximate that, but you're never going to exceed your theoretical bound.

I think it's reasonable to assume a normal, but not best possible implementation, unless you are previously aware of such special scenarios as the one you construed. The difference between both should be constant. If you always assume for the worst like you showed in your example, I doubt that you could do any bigO calculations at all.

I think it's okay to make assumptions, so long as you state them where there's any cause for doubt. I would assume without statement that adding fixedsize integers is a constant time operation, but as there are so many wildly different implementations of lists (depending on what you need) it's worth stating the assumptions there. Likewise adding variablesize integers would be worth calling out explicitly  particularly because it may not be obvious when just reading through it relatively casually.
That way when you consider the algorithm in a concrete setting, you can test the assumptions before starting the implementation. It's always nice if the API you're working with (collections etc) states the complexity of operations. Java and .NET are both getting quite good on this front, IIRC.

Interesting question.
In my opinion, the steps involved in the algorithm in question should not be complex algorithms themselves. In otherwords, if "add x to the list of xs" is a complex operation, it needs to be fully specified, and not just a single sentence. If it is expressed as a single, straight forward sentence, then assume it is appropriately fast (eg. O(1), or O(log n) or similar depending on the apparent operation).
Another case where this is clear is in stringbuilding. Sure, these days we have the stringbuilder class, and understand the problem well. But it was only a few years ago when I saw an application brought to its knees because the developer coded a simple "append thousands of small strings together to make a big string", but each append did a full memcopy

I believe the first step of doing bigO calculations is to decide which actions you consider elementary. In order to do that, you may stand by common point of view for standard actions (+,,*,/ are elementary, so are insertions in lists and so forth).
But in particular cases (those you mentionned, for instance), these actions are not elementary any more and you have to make new assumptions. You can make them using simple tests that you tried or, just out of reasoning. Any way what works for you is good !

The answer is to look at your use cases, and analyze in terms of them, rather than in the absolute.
What kind of data are you going to be consuming in the real world? If you're going to be crunching longs, assume constant time addition. If you're going to be crunching lots of unbounded BigNums, that's not a good assumption.
Does a list fit with the rest of your algorithm, or do you need lots of random access? O(n) arraytocopy is a worst case scenario, but is it legit to amortize your analysis
and consider what happens if you double the array size each time (giving you an amortized O(1) cost)?
Being able to answer these questions in terms of your own algorithms makes you a good algorithmatist. Being able to answer them in terms of algorithms you're considering for a project you're working on makes you a responsible implementer.
If your algorithm is being designed purely in the abstract, then be straightforward and explicit in your assumptions. If you assume constant time addition, just mention it. If you're doing amortized analysis, be sure you mention that.

Assume fair, to decent performance. If no guarantees are provided in some documented form assume a little as possible but the it should be safe to expect that the basic data structures and their operations preform as taught by the computer science department.
But what language you're running kind of matters.
A good C programmer is able to compile the code in his head. Meaning, he is able to glance at the code and get a fairly good approximate of what that would look like in machine code.
A Haskell programmer can't do that. It's the exact opposite. A Haskell programmer is going to care more about the complexity of his function and with good reason.
However, the Haskell programmer will never write code that's going to out perform what the C programmer writes. It's a misunderstanding among many that this is the case. Programmers that know how to write fast code on modern hardware are superior in the performance segment. However, it would be fair to assume that the Haskell programmer will be more productive.
There's a sizable overhead to any programming language and notable managed environments. This doesn't make them slow but it makes them slower. The less hardware specific, the slower execution.
There's plenty of ways you can write code to eliminate performance bottle necks without modifying your original algorithm and a lot of this has to with how the hardware is expected to execute your code.
What you assume about code and it's performance has to be based on measurements and profiling. It's simply not fair to disregard the hardware performance which the BigO notation does, (I'll admit it's very theoretical but not that practical).

I don't think that it is helpful when discussing efficiency of algorithms to go in details of the different languages the can be used (or OS or hardware architecture or ...).
More general it's always good to isolate the different aspects of a question, even if they influence each other.
My experience is, that in most of the cases, when it comes to complex algorithms, the most efficient one will work well in most language implementations. The optimization in matters of the used language is a more detailed thing in most of the cases. When discussing the algorithm in pseudocode the details should express the what in the most understandable way.
Your examples are such details, parts used in complex algorithms and can be discussed and adjusted during implementation in some language. When discussing those details, pseudocode can be used, but the syntax of the actual language(s) would do a better job than.

I usually assume best implemenation, and explicitly mention the assumed implementation if necessary (e.g. use a HashSet, or a BST). I would define the data structures being used, and not leave them ambiguous (i.e. I'd clearly state that this is an array or a linked list). This would give a clear view of the order of any internal operations.
I also consider basic, elementary operations to be O(1), or otherwise I clearly state that. Adding two numbers is O(1), unless we know that our numbers will cause overflow for primitive data types. For example, when assessing a shortest path algorithm, you don't consider the addition of costs to be a heavy operation. In my opinion, we are assuming that all similar algorithms in one category will have the same problem if numbers are larger than a certain bound, and thus they will still be comparable.
Note that we (should) use the bigO notation to compare algorithms, rather than calculate their running time. For this purpose, simple operations that need to be performed in all algorithms of a certain category (e.g. addition) can safely be assumed to have O(1). It's also acceptable to say that the cost of mathematical and logical operations is almost the same. We say a linear search is O(n), and a linear sum is also O(n). So, following this logic, all elementary operations can be thought of as O(1).
As a final note, what if we write an algorithm that requires 128bit, or even 256bit integers? I believe we should not assume it will be implemented with a BigInteger, because in a few years, these data types will probably be mainstream. Think about the same idea for algorithms that were invented in the 1960's, and you'll see how the same logic applies.

I would also assume the best implementation. It usually doesn't matter what it really is unless it becomes a bottleneck, in which case you should be using a better data structure anyways.

In some applications, hashtables inserts/reads are considered O(1) as well. It depends on what your baseline is.
For something even as 'simple' as sorting, if you're comparing sorts of large data that won't fit inside memory, it might be worth considering the access time for the hard disk as well. In this case, mergesort is much better than almost all of the other algorithms.
In other cases, specialized hardware may make some algorithms perform much better, by allowing things to be done in constant time that that might take O(n), normally.

Not exactly an answer to the question but this question remind me how luckily somebody had already invented RAM (RANDOM Access Memory) The question will be much difficult without it.
I think bigO notation is not an absolute property of the system. Any bigO statement need to be attached with the assumption on what kind of input is interested and expected to growth.