HOME > > Python 絖潟若

Python 絖潟若

Python ц綵違cユ茯сc障帥鐚綵吟<≪茯よВ鐚

python abc篏帥堺с篏帥箴紊絲障с紊c頳絲障篏帥鴻abc馹鋎帥abc号┤>羈吾鴻荐宴違c膂≦茹cс


潟若絖潟若(ゃ茵憗絖絲上≫)с
utf-8 絖潟若с
Python unicode 絖潟若сCPU <≪筝х
Python str 絖潟若cゃс阪ュ

unicode 吾с篏

潟若潟若ゃ潟 16 蚊 3042 с10 蚊 12354 с 12354 篏帥c茵 unicode 篏

>>> unichr(12354)
u'\u3042'

吾с腆冴帥

>>> import types
>>> type(unichr(12354))
< type 'unicode'>

unicode с

帥若吾阪

unicode 絲乗援医帥若阪帥

>>> import sys
>>> sys.stdout.write(unichr(12354))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0: 
ordinal not in range(128)

若冴若潟荀с unicode 障上阪с

>>> sys.stdout.write(unichr(12354).encode('utf-8'))

阪с

帥若<ゃ阪unicode с薈с隙潟若c潟違激鴻cゃ潟潟若腱帥若 utf-8 cсс utf-8 潟潟若

絲乗援医 print unicode 吾с筝絎障ゃ潟潟若帥若阪違

>>> print unichr(12354)

罩c阪鐚絨腱医с鐚鴻ゃ篏帥c<ゃ吾莨若翫障

$ echo "print unichr(12354)" > test.py
$ python test.py
$ 
$ python test.py > test.txt
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    print unichr(12354)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0: 
ordinal not in range(128)

腱 Python 腱帥若 utf-8 сャc罔羣阪冴羂 utf-8 潟潟若<ゃゃ腱潟若c潟違激鴻障сャc鐚python ゃ絨祉罔羣阪ゃc<ゃゃcф紊鐚

<print 潟潟若ゃ阪с

>>> print unichr(12354).encode('utf-8')

<ゃ篏帥c<ゃ吾莨若с

сゃ潟潟若腮蕁吾с

>>> import types
>>> type(unichr(12354).encode('utf-8'))
<type 'str'>

str 吾ссstr 1 ゃ絖茯よВс

潟若吾сutf-8 с潟潟若 str 吾сс<ゃ阪с

帥若ュ

帥若ュ帥若潟若c潟違激鴻cゃ緇

>>> ''
'\xe3\x81\x82'

腱帥若 utf-8 с蚊筝絖絲障 3 ゃゃュ ゃ unicode 吾с篏

>>> unicode('',encoding='utf-8')
u'\u3042'

сencoding='utf-8' 膃筝綣 '' utf-8 с潟潟若ゃс unicode 吾с紊腓冴с

python сゃ unicode 吾с紊潟若鐚潟若 encoding 絎с<鐚

unicode 潟若 str 鐚ゃ鐚
潟潟若

筝сc

>>> ''.decode('utf-8')
u'\u3042'

cс

腱絲乗援医с帥若ュ utf-8 с篁絎с

>>> u''
u'\u3042'

c腟緇

<ゃ阪ュ

篁ヤutf-8 帥若罔羣阪ュ茯帥鴻帥<ゃ吾阪ュ堺紊違

>>> import codecs
>>> f=codecs.open('a.txt', 'r', 'utf-8')
>>> l=f.readline()
>>> f.close()
>>> l
u'\u3042\n'
>>> type(l)
<type 'unicode'>
>>> print l.encode('utf-8')


<ゃ潟若c潟違激鴻絎с腟<ゃ茯水cс吾с unicode 潟若

障違<ゃ茯水у荅蚊<ゃ潟若c潟違激鴻絎c茹i

$ cat test.py
#coding: euc-jp
print type('')
print type(u'')
print u''.encode('utf-8')

utf-8 <ゃc #coding: euc-jp 茯ゃャ障上茵

$ python test.py
  File "test.py", line 2
SyntaxError: 'euc_jp' codec can't decode bytes in position 12-13: illegal multibyte sequence

若冴с違<ゃ潟若c潟違激鴻絎 euc 紊絎茵帥

$ nkf -e test.py > test_euc.py
$ python test_euc.py
<type 'str'>
<type 'unicode'>


euc ф吾 '' euc c潟潟若ゃ鐚str 鐚u'' unicode 潟若筝箴緇с unicode utf-8 潟潟若utf-8 帥若茵腓冴

ASCII 号┤

Python ASCII 絖宴cс障 ASCII 絎臂絖ユ絖罸鴻障障ャ篏帥с

潟若ゃ潟若違贋違 unicode 吾с篏翫篏

>>> unichr(12354)
u'\u3042'

医茵腓冴a篏

>>> unichr(97)
u'a'

篋冴荀絖茵腓冴

絲乗援医сュユ茯

>>> ''
'\xe3\x81\x82'

ゃ筝潟腓冴潟若 ASCII 絎臂絖

>>> 'a'
'a'

篋冴荀茵腓冴

筝 unicode 吾с罔羣阪 write 若莎激

>>> sys.stdout.write(unichr(12354))
Traceback (most recent call last):
  File "<stdin>", line 1, in ≶module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0: ordinal not in range(128)

a unicode 吾с罔羣阪 write 潟若障頫腓冴

>>> sys.stdout.write(chr(97))
a

ュゃ筝潟 unicode 吾с絖ユ茯

>>> unicode('')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

c薈с

>>> unicode('',encoding='utf-8')
u'\u3042'

膃筝綣違潟若c潟違激鴻薈 ASCII 絎臂膀蚊

>>> unicode('a')
u'a'

藥c

ASCII 号┤蚊ゃ chr() ∽違ASCII 潟若鐚贋逸綣違ゃ篏綣違 0 127 障с絎

>>> chr(97)
'a'

с

>>> unichr(12354).encode('utf-8')
'\xe3\x81\x82'

+ 羲膊絖潟若吾с紕g潟潟若ゃ紕gс

ASCII 膀蚊ゃ unicode 吾сg帥

>>> chr(97) + unichr(12355)
u'a\u3043'

茵潟若若冴 ASCII 号┤

ゃaссc

>>> unichr(12354).encode('utf-8') + unichr(12355)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: 
ordinal not in range(128)

若冴

絖激ャ len() ∽違篏帥蚊膈 unicode 薈ASCII 紊т紊綵吟医吟ASCII 号┤蚊ゃ

ASCII 茯号┤篏帥c潟若c障ユ茯宴箴紊c茲違ゃ絖с OK 吾鴻堺сASCII 絎у号┤逸罩翫兏荀罩c茹cャ鐚ユ茯篏帥膂≦с

絎顔≪ц儀馹荀帥(1) sqlite3

潟若c潟違激鴻馹冴罔羣阪ュ<ゃсゃ茯粋昭т戎c潟潟若 str 吾с絖unicode 吾с馹莎激

sqlite3 膂≦ SQL 若帥若鴻сpython с≪吾ャ若腟粋昭х亜篏帥ф

罨<箴unichr(12354) т unicode 吾с若帥若水ャ

import sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table test (u)")
cur.execute("insert into test(u) values (?)",(unichr(12354),))
cur.execute("select * from test")
r=cur.fetchone()
print r[0]
print type(r[0])

select сunicode 吾с緇

сunichr(12354) 篁c unichr(12354).encode('utf-8') cutf-8 с潟潟若 str 吾с筝

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

若冴ゃ水ャ鐚潟若吾с水ャ鐚сchr(97) ASCII 膀蚊ゃ篏帥若冴鐚吟 latin-1 障с綛恰鐚翫 select unicode 吾с緇

絎顔≪ц儀馹荀帥(2) tweepy

潟若c潟違激鴻馹冴罔羣阪ュ<ゃсゃ茯粋昭т戎c吾с緇馹莎激

tweepy twitter API 宴ゃсtwitter 潟若絲上tweepy 潟若吾с緇ゃ緇荀帥

#!/usr/bin/env python
import tweepy
sqlite3file='/home/hoshino/diary/twitter_log.sqlite'
credential = {
    'CONSUMER_KEY': 'xxxxxxxxxxxxxxxxxxxxx',
    'CONSUMER_SECRET': 'xxxxxxxxxxxxxxxxxxxxx',
    'ACCESS_KEY': 'xxxxxxxxxxxxxxxxxxxxx',
    'ACCESS_SECRET': 'xxxxxxxxxxxxxxxxxxxxx'}
maxcount=-1
auth = tweepy.OAuthHandler(credential['CONSUMER_KEY'], credential['CONSUMER_SECRET'])
auth.set_access_token(credential['ACCESS_KEY'], credential['ACCESS_SECRET'])
api = tweepy.API(auth)
t=api.home_timeline(count=1)
print type(t[0].author.screen_name)
print type(t[0].text)

鴻若潟若 str с腮水絎鴻 unicode у

Python3

Python 絖宴Python3 у紊膂≦障с str unicode 腟延桁с綵九剛сс障違

http://www.python.jp/doc/2.5/lib/encodings-overview.html
http://docs.python.org/release/3.0.1/howto/unicode.html
(荐 http://www.geocities.jp/tan9ent/unicode.html)
鐚Python3 翫鐚
http://d.hatena.ne.jp/fgshun/20090901/1251818730

罨≠
HOME


紕膣
薤藕

茯絖
沿
ユ茯


遵医