Detailed explanation of the subscripting problem of Sequence slices in Python

preamble

Slicing is an often-used syntax in python, whether it's for tuples, lists, or strings, and the general syntax is.

sequence[ilow:ihigh:step] # ihigh,step can be empty; for the sake of simplicity, the use of step is excluded for the time being.

Let's start with a simple demonstration of usage

sequence = [1,2,3,4,5]
sequence [ilow:ihigh] # Starting at ilow and ending at ihigh-1 #
sequence [ilow:]  # From ilow to the end
sequence [:ihigh]  # Starting at the head and ending at ihigh
sequence [:]   # Copy the entire list

The syntax is concise and easy to understand, and it's simple and easy to use in our day-to-day lives, but I'm sure there are a few rules that we're used to following when we use this slicing syntax.

ilow, ihigh are all less than the length of the sequece.
ilow < ihigh

Because in most cases, the only way to get the results we expect is to follow the rules above! But what if I don't? What happens to the slices?

Whether we're working with tuples, lists, or strings, when we want to retrieve an element, we use the following syntax.

sequence = [1,2,3,4,5]
print sequence[1] # Output 2
print sequence[2] # exports3

The above 1,2 we call subscripts, whether it is a tuple, list or string, we can use the subscripts to retrieve the corresponding value, but if the subscript exceeds the length of the object, then it will trigger an index exception (IndexError).

sequence = [1,2,3,4,5]
print sequence[15] 

### Output ###
Traceback (most recent call last):
 File "", line 2, in <module>
 print a[20]
IndexError: list index out of range

What about slicing? The two syntaxes are very similar, let's say my ilow and ihigh are 10 and 20 respectively, then what's the result?

replay the scene

# version: python2.7

a = [1, 2, 3, 5]
print a[10:20] # Will the results be reported as anomalous??

See 10 and 20, completely beyond the length of the sequence a, due to the previous code, or previous experience, we will always think that this will also lead to an IndexError, so let's open the terminal to test the following.

>>> a = [1, 2, 3, 5]
>>> print a[10:20]
[]

The result is: [], which is kind of interesting. Is this only for lists, strings, tuples?

>>> s = '23123123123'
>>> print s[400:2000]
''
>>> t = (1, 2, 3,4)
>>> print t[200: 1000]
()

The results are similar to those of a list, returning empty results for each.

see the result of our tears fall, not return an IndexError, but directly return empty, which makes us can not help but think, in fact, the syntax is similar, behind the things must still be different, then we try to try to explain the results of the next one it

principle analysis

Before we get into that, let's figure out how python handles the slicing, which can be facilitated by the dis module.

############# Slice ################
[root@iZ23pynfq19Z ~]# cat 
a = [11,2,3,4]
print a[20:30]

#Results.
[root@iZ23pynfq19Z ~]# python -m dis  
 1   0 LOAD_CONST    0 (11)
    3 LOAD_CONST    1 (2)
    6 LOAD_CONST    2 (3)
    9 LOAD_CONST    3 (4)
    12 BUILD_LIST    4
    15 STORE_NAME    0 (a)

 2   18 LOAD_NAME    0 (a)
    21 LOAD_CONST    4 (20)
    24 LOAD_CONST    5 (30)
    27 SLICE+3    
    28 PRINT_ITEM   
    29 PRINT_NEWLINE  
    30 LOAD_CONST    6 (None)
    33 RETURN_VALUE 

############# Single subscript fetch ################
[root@gitlab ~]# cat 
a = [11,2,3,4]
print a[20]

#Results.
[root@gitlab ~]# python -m dis 
 1   0 LOAD_CONST    0 (11)
    3 LOAD_CONST    1 (2)
    6 LOAD_CONST    2 (3)
    9 LOAD_CONST    3 (4)
    12 BUILD_LIST    4
    15 STORE_NAME    0 (a)

 2   18 LOAD_NAME    0 (a)
    21 LOAD_CONST    4 (20)
    24 BINARY_SUBSCR  
    25 PRINT_ITEM   
    26 PRINT_NEWLINE  
    27 LOAD_CONST    5 (None)
    30 RETURN_VALUE

In this brief introduction to the dis module, experienced drivers know, python in the interpretation of scripts, there is also a compilation process, the compilation result is that we often see the pyc file, which codeobject objects composed of byte code, and dis is to display these byte code in a more visual way, so that we can see the process of execution, the following are Here is the explanation of dis's output column.

The first column is the number is the line number of the original source code.
The second column is the offset of the bytecode: LOAD_CONST is on line 0 . And so on.
The third column is the bytecode human-readable names. They are for programmers.
The fourth column indicates the parameters of the instruction
The fifth column shows the calculated actual parameters

The main difference is that slicing is implemented using the bytecode SLICE+3, while single-subscripting is implemented using the bytecode BINARY_SUBSCR, which, as we suspected, is similar syntax but very different code. Since we're going to talk about slicing (SLICE+3), we won't discuss BINARY_SUBSCR, but if you're interested, you can check out the source code to see how it's implemented, here: python/object/

So let's get down to business, SLICE+3.

/* Taken from: python2.7 python/ */

// Step one.
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
  .... // Omit n lines of code
  TARGET_WITH_IMPL_NOARG(SLICE, _slice)
  TARGET_WITH_IMPL_NOARG(SLICE_1, _slice)
  TARGET_WITH_IMPL_NOARG(SLICE_2, _slice)
  TARGET_WITH_IMPL_NOARG(SLICE_3, _slice)
  _slice:
  {
   if ((opcode-SLICE) & 2)
    w = POP();
   else
    w = NULL;
   if ((opcode-SLICE) & 1)
    v = POP();
   else
    v = NULL;
   u = TOP();
   x = apply_slice(u, v, w); // Fetch v: ilow, w: ihigh, then call apply_slice
   Py_DECREF(u);
   Py_XDECREF(v);
   Py_XDECREF(w);
   SET_TOP(x);
   if (x != NULL) DISPATCH();
   break;
  }

 .... // Omit n lines of code
}

// Step two.
apply_slice(PyObject *u, PyObject *v, PyObject *w) /* return u[v:w] */
{
 PyTypeObject *tp = u->ob_type;  
 PySequenceMethods *sq = tp->tp_as_sequence;

 if (sq && sq->sq_slice && ISINDEX(v) && ISINDEX(w)) { // Type checking of v,w for integer/long integer objects.
  Py_ssize_t ilow = 0, ihigh = PY_SSIZE_T_MAX;
  if (!_PyEval_SliceIndex(v, &ilow))    // Re-check the v-object and convert its value to ilow.
   return NULL;
  if (!_PyEval_SliceIndex(w, &ihigh))    // Ibid.
   return NULL;
  return PySequence_GetSlice(u, ilow, ihigh);  // Get the slicing function corresponding to the u object.
 }
 else {
  PyObject *slice = PySlice_New(v, w, NULL);
  if (slice != NULL) {
   PyObject *res = PyObject_GetItem(u, slice);
   Py_DECREF(slice);
   return res;
  }
  else
   return NULL;
 }

// Step Three.
PySequence_GetSlice(PyObject *s, Py_ssize_t i1, Py_ssize_t i2)
{
 PySequenceMethods *m;
 PyMappingMethods *mp;

 if (!s) return null_error();

 m = s->ob_type->tp_as_sequence;
 if (m && m->sq_slice) {
  if (i1 < 0 || i2 < 0) {
   if (m->sq_length) {
    // Do a simple initialization, if the left and right tables are less than, add the sequence length to bring them to 0.
    Py_ssize_t l = (*m->sq_length)(s);
    if (l < 0)
     return NULL;
    if (i1 < 0)
     i1 += l;
    if (i2 < 0)
     i2 += l;
   }
  }
  // Actually call the object's sq_slice function to perform the slicing operation.
  return m->sq_slice(s, i1, i2);
 } else if ((mp = s->ob_type->tp_as_mapping) && mp->mp_subscript) {
  PyObject *res;
  PyObject *slice = _PySlice_FromIndices(i1, i2);
  if (!slice)
   return NULL;
  res = mp->mp_subscript(s, slice);
  Py_DECREF(slice);
  return res;
 }

 return type_error("'%.200s' object is unsliceable", s);

The code above is a bit long, but the key areas are commented out, and that's all we need to focus on. As above, we know that we're going to end up executing the m->sq_slice(s, i1, i2) , but this sq_slice is a bit special, because it corresponds to a different function for a different object, here are the corresponding functions for each.

// String objects
: (ssizessizeargfunc)string_slice, /*sq_slice*/

// List objects
: (ssizessizeargfunc)list_slice,  /* sq_slice */

// Tuples
: (ssizessizeargfunc)tupleslice,  /* sq_slice */

Since the implementation of all three functions is more or less the same, we can analyze only one of them, the following is the analysis of the list of slicing functions.

/* Taken from */
static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
 PyListObject *np;
 PyObject **src, **dest;
 Py_ssize_t i, len;
 if (ilow < 0)
  ilow = 0;
 else if (ilow > Py_SIZE(a))    // If ilow is greater than a, then reassign it to a.
  ilow = Py_SIZE(a);
 if (ihigh < ilow)  
  ihigh = ilow;
 else if (ihigh > Py_SIZE(a))    // If ihigh is greater than a, then reassign it to a.
  ihigh = Py_SIZE(a);
 len = ihigh - ilow;
 np = (PyListObject *) PyList_New(len); // Create a new list object ihigh - ilow
 if (np == NULL)
  return NULL;

 src = a->ob_item + ilow;
 dest = np->ob_item;
 for (i = 0; i < len; i++) {    // Add the members in the range to the new list object.
  PyObject *v = src[i];
  Py_INCREF(v);
  dest[i] = v;
 }
 return (PyObject *)np;
}

reach a verdict

From the above sq_slice function corresponding to the slice function can be seen, if in the use of slices, the left and right subscripts are greater than the length of the sequence, will be re-assigned to the length of the sequence, so let's start the slice.print a[10:20] , which actually runs.print a4:4 . With this analysis, you won't be confused when encountering slices whose subscripts are larger than the length of the object in the future.

Well, the above is the entire content of this article, I hope the content of this article on your learning or work can bring some help, if you have any questions you can leave a message to exchange, thank you for your support.