How do I create character arrays in numpy?

python string character-encoding numpy

Actually, you can do this without any copies or list comprehensions in numpy (caveats about non-equal-length strings aside...). Just view it as a 1 character string array and reshape it:

import numpy as npx = np.array(['hello','snake','plate'], dtype=str)y = x.view('S1').reshape((x.size, -1))print repr(y)

This yields:

array([['h', 'e', 'l', 'l', 'o'],       ['s', 'n', 'a', 'k', 'e'],       ['p', 'l', 'a', 't', 'e']],       dtype='|S1')

Generally speaking, though, I'd avoid using numpy arrays to store strings in most cases. There are cases where it's useful, but you're usually better off sticking to data structures that allow variable-length strings for, well, holding strings.

python string character-encoding numpy

You can create a numpy character array directly e.g.:

b = np.array([ ['h','e','l','l','o'],['s','n','a','k','e'],['p','l','a','t','e'] ])

The usual array tricks work with this.

If you have a and wish to generate b from it, note that:

list('hello') == ['h','e','l','l','o']

So you can do something like:

b = np.array([ list(word) for word in a ])

However, if a has words of unequal length (e.g. ['snakes','on','a','plane']), what do you want to do with the shorter words? You could pad them with spaces to the longest word:

wid = max(len(w) for w in a)b = np.array([ list(w.center(wid)) for w in a])

Which the string.center(width) pads with spaces, centering the string. You could also use rjust or ljust (see string docs).

python string character-encoding numpy

Specify the string length as the shape parameter with unicode 1 char

> string_array = ['..##.#..#.', '##..#.....', '#...##..#.', '####.#...#', '##.##.###.', '##...#.###', '.#.#.#..##', '..#....#..', '###...#.#.', '..###..###']> numpy.array(string_array,dtype=('U1',10))array([['.', '.', '#', '#', '.', '#', '.', '.', '#', '.'],       ['#', '#', '.', '.', '#', '.', '.', '.', '.', '.'],       ['#', '.', '.', '.', '#', '#', '.', '.', '#', '.'],       ['#', '#', '#', '#', '.', '#', '.', '.', '.', '#'],       ['#', '#', '.', '#', '#', '.', '#', '#', '#', '.'],       ['#', '#', '.', '.', '.', '#', '.', '#', '#', '#'],       ['.', '#', '.', '#', '.', '#', '.', '.', '#', '#'],       ['.', '.', '#', '.', '.', '.', '.', '#', '.', '.'],       ['#', '#', '#', '.', '.', '.', '#', '.', '#', '.'],       ['.', '.', '#', '#', '#', '.', '.', '#', '#', '#']], dtype='<U1')

This apparently should never have worked - https://github.com/numpy/numpy/issues/18407 and stops working in numpy 1.20.1 but an easy replacement is

numpy.array(list(map(list, string_array)))

which converts the string list to a list of char lists before numpy receives it avoiding the need to explicitly set the dtype.

CodeHunter

How do I create character arrays in numpy?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last