hg38 NCBI RefSeq genes, curated subset (NM_*, NR_*, NP_* or YP_*) (2024)

RefSeq Gene CCN2-AS1


RefSeq: NR_187596.1Status: Validated
Description: CCN2 antisense RNA 1, transcript variant 4
Molecule type: ncRNA
Source: BestRefSeq
Biotype: lncRNA
HGNC: 40164
Entrez Gene: 122152366
GeneCards: CCN2-AS1
AceView: CCN2-AS1

mRNA/Genomic Alignments (NR_187596.1)

BROWSER | SIZE IDENTITY CHROMOSOME STRAND START END QUERY START END TOTAL-----------------------------------------------------------------------------------------------------browser |  3883 100.0% 6 + 131901952 132102325 NR_187596.1 1 3883 3883
View details of parts of alignment within browser window.
Position: chr6:131901952-132102325
Band: 6q23.2
Genomic Size: 200374
Strand: +
Gene Symbol: CCN2-AS1

Links to sequence:

  • Predicted mRNA may be different from the genomic sequence.
  • Genomic Sequence from assembly
Data schema/format description and download

Go to NCBI RefSeq track controls

Source data version: NCBI RefSeq GCF_000001405.40-RS_2023_10 (2023-10-11)
Data last updated at UCSC:2024-01-29

Description

The NCBI RefSeq Genes composite track shows human protein-coding and non-protein-codinggenes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks usecoordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces byrealigning the RefSeq RNAs to the genome. This realignment may result in occasional differencesbetween the annotation coordinates provided by UCSC and NCBI. For RNA-seq analysis, we adviseusing NCBI aligned tables like RefSeq All or RefSeq Curated. See the Methods section for more details about how the different tracks were created.

Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records.

For more information on the different gene tracks, see our Genes FAQ.

Display Conventions and Configuration

This track is a composite track that contains differing data sets.To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to hide. Note: Not all subtracts are available on all assemblies.

The possible subtracks include:

RefSeq aligned annotations and UCSC alignment of RefSeq annotations
  • RefSeq All – all curated and predicted annotations provided by RefSeq.
  • RefSeq Curated – subset of RefSeq All that includes only those annotations whose accessions begin with NM, NR, NP or YP. (NP and YP are used only for protein-coding genes on the mitochondrion; YP is used for human only.)
  • RefSeq Predicted – subset of RefSeq All that includes those annotations whose accessions begin with XM or XR.
  • RefSeq Other – all other annotations produced by the RefSeq group that do not fit the requirements for inclusion in the RefSeq Curated or the RefSeq Predicted tracks, as they do not have a product and therefore no RefSeq accession. More than 90% are pseudogenes, T-cell receptor or immunoglobulin segments. The few remaining entries are gene clusters (e.g. protocadherin).
  • RefSeq Alignments – alignments of RefSeq RNAs to the human genome provided by the RefSeq group, following the display conventions forPSL tracks.
  • RefSeq Diffs – alignment differences between the human reference genome(s) and RefSeq transcripts. (Track not currently available for every assembly.)
  • UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the human genome. This track was previously known as the "RefSeq Genes" track.
  • RefSeq Select+MANE (subset) – Subset of RefSeq Curated, transcripts marked as RefSeq Select or MANE Select. A single Select transcript is chosen as representative for each protein-coding gene. This track includes transcripts categorized as MANE, which are further agreed upon as representative by both NCBI RefSeq and Ensembl/GENCODE, and have a 100% identical match to a transcript in the Ensembl annotation. See NCBI RefSeq Select. Note that we provide a separate track, MANE (hg38), which contains only the MANE transcripts.
  • RefSeq HGMD (subset) – Subset of RefSeq Curated, transcripts annotated by the Human Gene Mutation Database. This track is only available on the human genomes hg19 and hg38. It is the most restricted RefSeq subset, targeting clinical diagnostics.

The RefSeq All, RefSeq Curated, RefSeq Predicted, RefSeq HGMD,RefSeq Select/MANE and UCSC RefSeq tracks follow the display conventions forgene prediction tracks.The color shading indicates the level of review the RefSeq record has undergone:predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq.

Color Level of review
Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information.
Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff.
Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted.

The item labels and codon display properties for features within this track can be configured through the check-box controls at the top of the track description page. To adjust the settings for an individual subtrack, click the wrench icon next to the track name in the subtrack list .

  • Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name or OMIM identifier instead of the gene name, show all or a subset of these labels including the gene name, OMIM identifier and accession names, or turn off the label completely.
  • Codon coloring: This track has an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page.

The RefSeq Diffs track contains five different types of inconsistency between thereference genome sequence and the RefSeq transcript sequences. The five types of differences areas follows:

  • mismatch – aligned but mismatching bases, plus HGVS g. to show the genomic change required to match the transcript and HGVS c./n. to show the transcript change required to match the genome.
  • short gap – genomic gaps that are too small to be introns (arbitrary cutoff of < 45 bp), most likely insertions/deletion variants or errors, with HGVS g. and c./n. showing differences.
  • shift gap – shortGap items whose placement could be shifted left and/or right onthe genome due to repetitive sequence, with HGVS c./n. position range of ambiguous region in transcript. Here, thin and thick lines are used -- the thin line shows the span of therepetitive sequence, and the thick line shows the rightmost shifted gap.
  • double gap – genomic gaps that are long enough to be introns but that skip over transcript sequence (invisible in default setting), with HGVS c./n. deletion.
  • skipped – sequence at the beginning or end of a transcript that is not aligned to the genome (invisible in default setting), with HGVS c./n. deletion

HGVS Terminology (Human Genome Variation Society):g. = genomic sequence ; c. = coding DNA sequence ; n. = non-coding RNA reference sequence.

When reporting HGVS with RefSeq sequences, to make sure that results fromresearch articles can be mapped to the genome unambiguously, please specify the RefSeq annotation release displayed on the transcript'sGenome Browser details page and also the RefSeq transcript ID with version(e.g. NM_012309.4 not NM_012309).

Methods

Tracks contained in the RefSeq annotation and RefSeq RNA alignment tracks were created at UCSC using data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and converted to the genePred and PSL table formats for display in the Genome Browser. Information aboutthe NCBI annotation pipeline can be found here.

The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments.

The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks.RefSeq RNAs were aligned against the human genome using BLAT. Those with an alignment ofless than 15% were discarded. When a single RNA aligned in multiple places, the alignmenthaving the highest base identity was identified. Only alignments having a base identitylevel within 0.1% of the best and at least 96% base identity with the genomic sequence werekept.

Data Access

The raw data for these tracks can be accessed in multiple ways. It can be explored interactively using the REST API,Table Browser orData Integrator. The tables can also be accessed programmatically through ourpublic MySQL server or downloaded from ourdownloads server for local processing. The previous track versions are availablein the archives of our downloads server. You can also access any RefSeq tableentries in JSON format through our JSON API.

The data in the RefSeq Other and RefSeq Diffs tracks are organized in bigBed file format; moreinformation about accessing the information in this bigBed file can be foundbelow. The other subtracks are associated with database tables as follows:

genePred format:
  • RefSeq All - ncbiRefSeq
  • RefSeq Curated - ncbiRefSeqCurated
  • RefSeq Predicted - ncbiRefSeqPredicted
  • RefSeq HGMD - ncbiRefSeqHgmd
  • RefSeq Select+MANE - ncbiRefSeqSelect
  • UCSC RefSeq - refGene
PSL format:
  • RefSeq Alignments - ncbiRefSeqPsl

The first column of each of these tables is "bin". This column is designedto speed up access for display in the Genome Browser, but can be safely ignored in downstreamanalysis. You can read more about the bin indexing systemhere.

The annotations in the RefSeqOther and RefSeqDiffs tracks are stored in bigBed files, which can be obtained from our downloads server here,ncbiRefSeqOther.bb and ncbiRefSeqDiffs.bb.Individual regions or the whole set of genome-wide annotations can be obtained using our toolbigBedToBed which can be compiled from the source code or downloaded as a precompiledbinary for your system from the utilities directory linked below. For example, to extract onlyannotations in a given region, you could use the following command:

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncbiRefSeq/ncbiRefSeqOther.bb-chrom=chr16 -start=34990190 -end=36727467 stdout

You can download a GTF format version of the RefSeq All table from the GTF downloads directory.The genePred format tracks can also be converted to GTF format using thegenePredToGtf utility, available from theutilities directory on the UCSC downloads server. The utility can be run from the command line like so:

genePredToGtf hg38 ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf

Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Accesssection.

A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, and RefSeq Predicted tracks can be found on our downloads serverhere.

Please refer to our mailing list archives for questions.

Previous versions of the ncbiRefSeq set of tracks can be found on our archive download server.

Credits

This track was produced at UCSC from data generated by scientists worldwide and curated by theNCBI RefSeq project.

References

Kent WJ.BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64.PMID: 11932250; PMC: PMC187518

Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J,Landrum MJ, McGarvey KM et al.RefSeq: an update on mammalian reference sequences.Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63.PMID: 24259432; PMC: PMC3965018

Pruitt KD, Tatusova T, Maglott DR.NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4.PMID: 15608248; PMC: PMC539979


hg38 NCBI RefSeq genes, curated subset (NM_*, NR_*, NP_* or YP_*) (2024)

FAQs

What is NM and NP in NCBI? ›

NM accession number links to the mRNA record in the Nucleotide database. NP accession number links to the protein record in the Protein database.

What is RefSeq a curated database of? ›

NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins.

What is hg38 human genome? ›

The official name for the current human reference genome assembly is Genome Reference Consortium Human Build 38. It is abbreviated as GRCh38. GRCh38 is referred to as hg38 in the UCSC Genome Browser, but this is not the official assembly name or abbreviation. The GenBank accession for GRCh38 is GCA_000001405.

What is the difference between RefSeq and GenBank at NCBI? ›

The RefSeq database is non-redundant because it is composed of a single sequence, derived from all the similar sequences in GenBank. Each RefSeq record serves as a reference standard because in principle it is more accurate, and more completely annotated, than any single sequence in GenBank.

What is a RefSeq gene? ›

The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq sequences form a foundation for medical, functional, and diversity studies.

What is the difference between RefSeq and nucleotide? ›

Whereas the International Nucleotide Sequence Database Collaboration (INSDC, made up of GenBank, the European Nucleotide Archive, and the DNA Data Bank of Japan) represents an archival repository of all sequences, the RefSeq database is a non-redundant set of reference standards derived from the INSDC databases that ...

How many genomes are in RefSeq? ›

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation.

Is RefSeq database primary or secondary? ›

GenBank and SRA are primary archival resources that collect data from researchers as part of the publication process. RefSeq is a secondary database built from data submitted to the primary archives but with added curation.

What is the difference between hg19 and HG38 genome? ›

‍Updated Assembly: HG38 provides a more complete and accurate representation of the human genome, with fewer gaps and numerous sequence updates. HG38 altered 8,000 nucleotides and expanded the coverage to approximately 95% of the human genome, compared to ~92.5% in HG198,13.

What is the most updated human reference genome? ›

The GRC released GRCh38. p14, a non-coordinate changing update to the human reference assembly in May 2022.

Is hg19 the same as GRCh37? ›

While hg19 and GRCh37 are the same genome build, UCSC appends "chr" to the beginning of the chromosome names, e.g. chr1, chr2, etc. On the other hand, Ensembl leaves the chromosomes as is: 1, 2, etc. Another difference is the mitochondrial genome, which UCSC labels chrM and Ensembl labels MT.

What is the difference between Ensembl and RefSeq transcripts? ›

Why are there differences between RefSeq and Ensembl transcripts and exons? Ensembl and RefSeq transcripts differ in that Ensembl transcripts are mapped onto the reference genome, whereas RefSeq transcripts are mapped onto mRNA sequences.

What is the difference between Gencode and RefSeq? ›

E.g. RefSeq's criteria are more stringent, so there are fewer RefSeq transcripts than Ensembl/GENCODE transcripts. Also, RefSeq transcripts have their own sequences independent of the genome assembly, so certain population-specific variants may be in RefSeq that are entirely missing from the reference genome sequence.

What is RefSeq select proteins? ›

RefSeq Select addresses the issue of multiple transcripts per gene and introduces an automated workflow that identifies a single curated RefSeq transcript for every protein-coding gene.

What is the NM prefix in NCBI? ›

4, "NM" indicates the molecule type (i.e., protein-coding transcript, or mRNA) and staff-curated processing; "183124" is a six number identifier; and the last "4" is the version number. The following table summarizes RefSeq accession numbers.

What does NM mean in genes? ›

NM (Neutrophil Migration) is a Genetic Locus.

What is an nm number? ›

The nanometre (international spelling as used by the International Bureau of Weights and Measures; SI symbol: nm), or nanometer (American spelling), is a unit of length in the International System of Units (SI), equal to one billionth (short scale) of a meter (0.000000001 m) and to 1000 picometres.

What is NCBI NIH and NLM? ›

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH).

References

Top Articles
Latest Posts
Article information

Author: Margart Wisoky

Last Updated:

Views: 6221

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.